Dogfooding Guide: CPG-Powered Commit Analysis

CodeGraph analyzes its own codebase through the Code Property Graph after every commit, creating a Plan-Act-Review feedback loop. Claude Code receives quality metrics, blast radius data, interface impact analysis, and before/after comparison as context immediately after committing code.

Table of Contents

How It Works

The dogfooding pipeline connects three runtime pieces:

  1. GoCPG builds and maintains a Code Property Graph (DuckDB) with pre-computed metrics for every method: cyclomatic complexity, fan-in/fan-out, TODO/FIXME flags, debug code, deprecated usage.

  2. Dogfood CLI and review runtime query the CPG for changed methods, compute quality metrics, blast radius, interface impact, cross-module alerts, and persist review traces in data/reviews/ when recovery is needed.

  3. Local service readiness checks surface whether supporting services are available. dogfood status reports CPG freshness, maintenance pressure, lock diagnostics, and OpenViking availability for the local development contour.

The result: recent changes can be evaluated with a traceable quality assessment without relying on local git hooks.

Usage Scenarios

Scenario 1: Post-commit review with explicit analysis

The primary local scenario. You work in Claude Code, make changes, commit, then inspect the diff explicitly:

You: "Commit these changes and analyze the result"
Claude: git add src/intent/classifier.py && git commit -m "refactor: extract pattern table"
Claude: python -m src.cli.import_commands dogfood analyze --base-ref HEAD~1

The review pipeline produces a report:

## Commit Analysis Report
**Summary:** 1 files, 45 methods, 2 high-CC, 3 TODO/FIXME, 128 affected callers
**CPG status:** fresh

**Impact of changes:**
- `_get_fallback_domain`: CC 29->8 (-21), FanOut 18->5 (-13)

**High complexity methods:**
- `classify` (CC: 17)
- `_score_domain` (CC: 16)

**Blast radius:** 128 callers affected
- `classify` called by: `run_intent_classifier`, `IntentBenchmark._evaluate_single` +126 more

**Interface changes detected:**
- **CLI**: `src/cli/intent_commands.py` (`add_intent_commands`, `_run_classify`)

**Cross-module alert** — related interfaces may need updates:
- Changed CLI → check MCP: `codegraph_intent`, `register_intent_tools`

Claude sees this context and can react: “The refactoring reduced _get_fallback_domain complexity from 29 to 8. Two methods still have CC>10: classify and _score_domain. CLI was changed — check if the MCP tool needs updating.”

What runs the analysis: explicit dogfood analyze, dogfood continue, or another runtime flow that calls the same review pipeline.

What does not run it automatically: plain terminal commits, git commit --amend, and older hook-based local flows that are no longer part of the recommended workflow.

Scenario 2: Find and fix quality issues

Use CPG queries to find code quality targets, then fix them with the pipeline providing feedback:

You: "Query the CPG for methods with CC > 15 and TODO/FIXME flags in src/workflow/"
Claude: [runs DuckDB query]
  Found: _get_fallback_domain (CC=29, TODO), PolicyViolationsHandler.handle (CC=68, TODO)

You: "Refactor _get_fallback_domain to reduce complexity"
Claude: [extracts patterns to data table, replaces if/else chain with loop]

You: "Commit"
Claude: git commit -m "refactor: extract fallback patterns to class-level table"
Claude: python -m src.cli.import_commands dogfood analyze --base-ref HEAD~1
  -> Review report shows: CC 29->8 (-21)

This is the full Plan-Act-Review loop: 1. Plan: CPG query identifies the problem 2. Act: Refactoring reduces complexity 3. Review: Hook confirms the improvement with concrete metrics

Scenario 3: Validate refactoring impact

Before making a large refactoring, check the blast radius:

You: "What's the blast radius if I change HierarchicalIntentClassifier.classify?"
Claude: [queries call_containment]
  213 direct callers across production code and tests

You: "Proceed with the refactoring"
Claude: [makes changes, commits]
Claude: python -m src.cli.import_commands dogfood analyze --base-ref HEAD~1
  -> Review report shows: 213 affected callers, CC unchanged, no regressions

The blast radius report helps gauge the risk of changes before they happen.

Scenario 4: On-demand analysis

Run analysis without committing:

# Analyze the last commit
python -m src.cli.import_commands dogfood analyze --base-ref HEAD~1

# Analyze changes between branches
python -m src.cli.import_commands dogfood analyze --base-ref origin/main

# Generate a full quality report
python -m src.cli.import_commands dogfood report --format markdown

# Validate numeric claims in documentation
python -m src.cli.import_commands dogfood validate-claims --path docs/

# Show quality trend across recent commits
python -m src.cli.import_commands dogfood trend --commits 20

Pipeline Architecture

Data flow

git commit
    |
    v
Explicit review run (`dogfood analyze --base-ref HEAD~1`)
    |
    +-- CPG freshness check
    |       Query cpg_git_state.commit_hash, compare to git rev-parse HEAD
    |
    +-- Pre-update metrics capture (for delta report)
    |       Query nodes_method for changed files BEFORE CPG update
    |       Store {full_name: {cc, fan_out, ...}} for later comparison
    |
    +-- CPG update if stale
    |       gocpg update --input=<source> --output=<db>
    |
    +-- Phase 1: Get changed files
    |       git diff --name-only HEAD~1 HEAD, filter code extensions
    |
    +-- Phase 2: Get changed methods from CPG
    |       Query nodes_method for changed files, deduplicate
    |
    +-- Phase 3: Quality summary
    |       Compute high-CC, high-fan_out, TODO, debug, deprecated counts
    |
    +-- Phase 4: Blast radius
    |       Query call_containment (or nodes_call fallback) for callers
    |
    +-- Phase 5: Interface impact detection
    |       Check if changed files belong to interface layers (CLI, REST API, MCP, ACP)
    |
    +-- Phase 6: Cross-module alerts
    |       Find related functions in OTHER interface layers by keyword matching
    |
    +-- Phase 7: Story coverage delta
    |       Flag layers that changed vs layers not covered
    |
    +-- Record quality snapshot (cpg_quality_history table)
    |
    +-- Output: {"additionalContext": "## Commit Analysis Report\n..."}
            Injected back into Claude Code conversation

Timeout budget

The review runtime is budgeted to keep interactive runs bounded:

Phase Budget Action
Freshness check 2s Compare cpg_git_state.commit_hash to git rev-parse HEAD
Pre-update metrics ~1s Query current metrics for changed files (for delta report)
CPG update if stale 40s Run gocpg update --input=<source> --output=<db>
Phases 1–7 ~15s Changed files, methods, quality, blast radius, interfaces, cross-module, story

If any phase exceeds its budget, the runtime degrades gracefully: it produces whatever data it has or returns empty {}.

CPG freshness and automatic update

The runtime checks CPG freshness by comparing cpg_git_state.commit_hash to git rev-parse HEAD. If stale, it runs gocpg update:

gocpg update --input=<source_path> --output=<db_path>

Hook Infrastructure

The legacy hook/runtime support code is still relevant as implementation detail even though the recommended workflow is now explicit dogfood analyze / dogfood continue execution instead of background post-commit hooks.

Key helper modules in src/dogfooding/hooks/:

  • _feedback.py maps pipeline results into ReviewFinding and ReviewFeedback records.
  • _metrics.py exposes hook_metrics counters and timing helpers for local observability.
  • _utils.py provides shared helpers such as timed_hook wrappers and runtime-safe formatting.
  • _session_cache.py manages session_cache state used to correlate repeated local review runs.

These internals matter when you troubleshoot degraded hook behavior, inspect fallback paths, or compare the current explicit runtime against older hook-based experiments.

This triggers an incremental update of the CPG database. The CPGFreshnessChecker class in src/dogfooding/cpg_freshness.py manages this:

from src.dogfooding.cpg_freshness import CPGFreshnessChecker

checker = CPGFreshnessChecker(db_path, repo_path=".", gocpg_binary="gocpg/gocpg.exe")
checker.is_fresh()           # True if CPG commit == HEAD
checker.commits_behind()     # Number of commits CPG is behind
checker.ensure_fresh(timeout=40.0, source_path=".")  # Update if stale
checker.status()             # Full status dict

Freshness checks now include git-head fallback logic for environments where git rev-parse HEAD is unreliable in subprocesses. The checker resolves HEAD from .git/HEAD, refs, and packed-refs (including worktree indirection) before returning head_commit as unknown.

When update fails with DuckDB lock contention, detailed diagnostics include lock classification, lock-holder PIDs, optional auto-unlock attempt results, and actionable next_step / next_command guidance.

Freshness is reported in two forms: - is_fresh_strict: exact commit match (cpg_commit == head_commit) - is_fresh: effective freshness (strict match OR no CPG-relevant file changes between commits, e.g. docs-only commits)

Automatic vs explicit freshness controls

Mechanism Trigger Typical use
dogfood status Explicit/manual Inspect freshness, lock diagnostics, maintenance due, and OpenViking readiness
dogfood analyze Explicit/manual Run bounded post-commit or branch-diff review
codegraph_watch check / codegraph_watch update Explicit/manual Deterministic freshness checks in headless and CI
CPGFreshnessChecker.ensure_fresh_with_details() Explicit/manual Programmatic control and machine-readable failure diagnostics

If your workflow depends on guaranteed freshness before analysis, prefer explicit codegraph_watch update or dogfood status over any legacy background automation.

Method deduplication

GoCPG may store the same method with different filename formats (forward slash src/file.py vs backslash src\file.py). The analyzer deduplicates by normalizing full_name slashes and keeping the entry with the highest CC value:

# Before dedup: 2 entries for the same method
src\intent\classifier.py:Classifier.classify  CC=17
src/intent/classifier.py:Classifier.classify   CC=0  (from incremental update)

# After dedup: 1 entry, highest CC wins
src\intent\classifier.py:Classifier.classify  CC=17

Delta report

When the CPG is stale (needs update), the hook captures pre-update metrics before running gocpg update, then compares against post-update metrics. This produces a delta showing the actual impact of changes:

**Impact of changes:**
- `_get_fallback_domain`: CC 29->8 (-21), FanOut 18->5 (-13)
- `_build_quality_summary`: CC 5->3 (-2)

Methods with no metric changes are omitted.

Interface impact detection

The analyzer tracks 4 interface layers defined in INTERFACE_LAYERS:

Layer Path Patterns Description
CLI src/cli/ CLI commands
REST API src/api/routers/ API endpoints
MCP src/mcp/tools/, src/mcp/ MCP tools
ACP src/acp/server/, src/acp/ ACP handlers

When a changed file belongs to an interface layer, the report includes an “Interface changes detected” section listing affected layers and methods.

Cross-module alerts

When a file in one interface layer changes, the analyzer searches for related functions in OTHER layers by extracting keywords from the changed filename and querying the CPG. For example, if src/cli/reindex_commands.py changes, it looks for functions with “reindex” in their name in MCP, REST API, etc.

The report includes a “Cross-module alert” section suggesting which layers to check:

**Cross-module alert** — related interfaces may need updates:
- Changed CLI → check MCP: `codegraph_reindex`, `register_reindex_tools`

Story coverage delta

When interface layers are changed, the analyzer flags which other layers may need updates to maintain feature parity. The report includes a “Story coverage check” section:

**Story coverage check** — verify other interfaces:
- CLI changed (`reindex`), check: MCP, REST API, ACP

Runtime Checks

dogfood status is the primary readiness probe for the current local workflow. It reports:

  • CPG freshness and commit lag
  • maintenance pressure for the DuckDB file
  • lock diagnostics and recovery guidance
  • persisted review trace state from data/reviews/
  • OpenViking availability for the local development stack

CLI Commands

All commands are accessed via python -m src.cli.import_commands dogfood <subcommand>.

dogfood status

Check CPG freshness, review state, and local runtime readiness:

python -m src.cli.import_commands dogfood status [--db PATH]

dogfood analyze

Run commit analysis on demand:

python -m src.cli.import_commands dogfood analyze [--base-ref HEAD~1] [--db PATH]

dogfood report

Generate quality report (markdown or JSON):

python -m src.cli.import_commands dogfood report [--format markdown|json] [--db PATH]

dogfood validate-claims

Validate numeric claims in documentation against the CPG. Extracts numbers from markdown (e.g., “95 handlers”, “12 scenarios”) and verifies via SQL:

python -m src.cli.import_commands dogfood validate-claims [--path PATH] [--db PATH]

Claim rules are defined in config.yamldogfooding.claims_validation.rules[]. Each rule maps keywords (English + Russian) to a SQL query:

claims_validation:
  enabled: true
  timeout: 5.0
  rules:
    - keywords: ["handlers", "обработчиков"]
      sql: "SELECT COUNT(DISTINCT full_name) FROM nodes_method WHERE ..."
      description: "Scenario handler methods"

dogfood trend

Show quality trend across recent commits from the cpg_quality_history table:

python -m src.cli.import_commands dogfood trend [--commits N] [--db PATH]

Output is an ASCII table with columns: Commit, Date, Methods, Avg CC, Dead, Hi-CC, TODO.

Quality snapshots are recorded automatically after each commit analysis via record_snapshot() in src/dogfooding/quality_history.py.

dogfood validate-stories

Validate user story interface coverage via CPG using StoryValidationRunner:

python -m src.cli.import_commands dogfood validate-stories [--stories 2,8,11] [--path FILE] [--output FILE] [--db PATH] [--go-db PATH]

dogfood config-check

Detect orphan configuration parameters by cross-referencing YAML config, schema, and code usage:

python -m src.cli.import_commands dogfood config-check [--format text|json|csv] [--level error|warning|info|all] [--fix-suggestions] [--config PATH] [--schema PATH] [--source DIR...]
Parameter Default Description
--format text Output format: text, json, or csv
--level all Minimum severity level to show
--fix-suggestions off Show fix suggestions for each finding
--config config.yaml Path to YAML config file
--schema src/config/unified_config.py Path to schema file
--source src/ Source directories to scan (multiple allowed)

Detects 6 orphan types: yaml_unused, yaml_missing, code_orphan, path_mismatch, orphaned_dataclass, unused_default. Uses ConfigOrphanAnalyzer from src/analysis/config_analyzer.py.

dogfood maintain-db

Perform routine CPG maintenance and cleanup:

python -m src.cli.import_commands dogfood maintain-db [--db PATH] [--force] [--json]
Parameter Default Description
--db auto-detected DuckDB database path
--force off Continue even when the command detects a risky state
--json off Return machine-readable maintenance details

Use this command when quality history tables, review traces, or stale maintenance markers need a controlled cleanup step.

dogfood continue

Resume an interrupted dogfooding workflow from the stored review state:

python -m src.cli.import_commands dogfood continue [--db PATH] [--review-dir PATH] [--json]
Parameter Default Description
--db auto-detected DuckDB database path
--review-dir data/reviews Directory with persisted review state
--json off Return machine-readable status for automation

This is the recovery path when a post-commit review was interrupted and you want to continue from the last saved checkpoint instead of starting over.

Configuration

In config.yaml:

dogfooding:
  enabled: true
  auto_update_cpg: true          # Run gocpg update if CPG is stale
  cpg_update_timeout: 40         # Seconds for CPG update
  analysis_timeout: 16           # Seconds for quality + blast radius
  cc_threshold: 10               # Flag methods with CC above this
  fan_out_threshold: 30          # Flag methods with fan_out above this
  blast_radius_depth: 2          # Max depth for caller traversal
  max_files_per_commit: 15       # Max files to analyze per commit
  report_format: markdown        # markdown or json
  record_quality_history: true   # Record QualitySnapshot per commit
  quality_history_db_path: data/quality_history.duckdb  # Optional separate DB for snapshots
  include_paths:                 # Limit dogfooding to selected source roots
    - src
    - tests
  exclude_paths:                 # Skip generated or third-party code
    - .venv
    - node_modules
  claims_validation:
    enabled: true
    timeout: 5.0                 # Seconds per claim query
    rules:                       # Keyword→SQL mappings for validate-claims
      - keywords: ["handlers", "обработчиков"]
        sql: "SELECT COUNT(...) FROM nodes_method WHERE ..."
        description: "Scenario handler methods"

CommitReport

The CommitReport dataclass (src/dogfooding/commit_analyzer.py) holds the full analysis result:

Field Type Description
changed_files List[str] Code files changed in the commit
changed_methods List[dict] Methods in changed files (deduplicated)
blast_radius Dict {"callers": {method: [callers]}, "total_affected": N}
quality_summary Dict High-CC, high-fan_out, TODO, debug, deprecated counts
interface_impacts List[dict] Interface layers affected (CLI, REST API, MCP, ACP)
cross_module_alerts List[dict] Related functions in other interface layers
story_coverage_delta List[dict] Story coverage gaps across layers
is_cpg_fresh bool Whether CPG was up-to-date
analysis_time_ms int Total analysis time in milliseconds
deltas List[dict] Before→after metric changes

Report Format

The review pipeline returns a markdown report as additionalContext in JSON:

{"additionalContext": "## Commit Analysis Report\n**Summary:** ..."}

Full report structure (sections are omitted when empty):

## Commit Analysis Report
**Summary:** 3 files, 45 methods, 2 high-CC, 1 TODO/FIXME, 128 affected callers
**CPG status:** fresh

**Impact of changes:**
- `_get_fallback_domain`: CC 29->8 (-21), FanOut 18->5 (-13)

**High complexity methods:**
- `classify` (CC: 17)
- `_score_domain` (CC: 16)

**High fan-out methods:**
- `classify` (fan_out: 39)

**Blast radius:** 128 callers affected
- `classify` called by: `run_intent_classifier`, `IntentBenchmark._evaluate_single` +126 more
- `_classify_domain` called by: `classify`, `get_morph` +4 more

**Interface changes detected:**
- **CLI**: `src/cli/intent_commands.py` (`add_intent_commands`, `_run_classify`)

**Cross-module alert** — related interfaces may need updates:
- Changed CLI → check MCP: `codegraph_intent`, `register_intent_tools`

**Story coverage check** — verify other interfaces:
- CLI changed (`intent`), check: MCP, REST API, ACP

*Analysis completed in 95ms*

Scaling to Other Projects

The dogfooding pipeline is project-agnostic. To set up for any project:

  1. Import the project to create a CPG database: bash python -m src.cli import /path/to/project --language python

  2. Register the project in config.yaml: yaml projects: active: my_project registry: my_project: db_path: data/projects/my_project.duckdb source_path: /path/to/project language: python domain: python_generic

  3. Verify the local runtime: bash python -m src.cli.import_commands dogfood status --db data/projects/my_project.duckdb python -m src.cli.import_commands dogfood analyze --base-ref HEAD~1 --db data/projects/my_project.duckdb

The dogfood commands read the active project from config.yaml and resolve the correct database path automatically.

Troubleshooting

Analysis produces empty {} output: - Check that the database file exists at the configured db_path - Verify the active project in config.yaml has a valid db_path - Run python -m src.cli.import_commands dogfood status to check freshness - Ensure the commit changed code files (.py, .go, .c, etc.), not just docs or configs

CPG always shows stale: - Ensure gocpg binary exists at gocpg/gocpg.exe (or the configured GOCPG_PATH) - Run python -m src.cli.import_commands dogfood status and inspect recommended_next_action - Try manual update: gocpg/gocpg.exe update --input=. --output=<db>

CC values are 0 after incremental update: - Incremental gocpg update may skip MethodMetricsPass for some entries. New entries can have cyclomatic_complexity=0. - The deduplication logic keeps the entry with the highest CC value, mitigating this. - If persistent, re-import the project from scratch: python -m src.cli import /path/to/source

DuckDB lock error (“file is being used by another process”): - Another gocpg.exe process is running (for example from gocpg watch or a concurrent refresh). - The runtime uses read-only connections and handles lock errors gracefully, falling back to subprocess queries. - codegraph_watch update / ensure_fresh_with_details() return lock diagnostics (failure_kind=db_lock, locker_pids, auto_unlock_*, next_command) to speed up recovery. - If the locker PID is the current Python process, auto-unlock intentionally skips killing itself; run the suggested next_command after closing the locker.

Delta report not appearing: - The delta report only appears when the CPG was stale before the update (pre-update metrics were captured). - If the CPG is already fresh (e.g., gocpg watch updated it), there are no pre-update metrics to compare against.

Timeout exceeded: - The 58s budget (60s Claude Code limit minus 2s margin) accommodates most commits. For very large projects, gocpg update may exceed the 40s phase budget. - Reduce max_files_per_commit in config. - Ensure GoCPG indexes are up to date: gocpg/gocpg.exe index --db=<db>

OpenViking is missing from status: - Start the local stack and confirm the OpenViking service is listening on the configured port. - Run python -m src.cli.import_commands dogfood status again and inspect the openviking_status section.