Dogfooding Guide: CPG-Powered Commit Analysis

CodeGraph analyzes its own codebase through the Code Property Graph after every commit, creating a Plan-Act-Review feedback loop. Claude Code receives quality metrics, blast radius data, interface impact analysis, and before/after comparison as context immediately after committing code.

Table of Contents¶

How It Works
Usage Scenarios
Scenario 1: Post-commit review with explicit analysis
Scenario 2: Find and fix quality issues
Scenario 3: Validate refactoring impact
Scenario 4: On-demand analysis
Pipeline Architecture
Data flow
Timeout budget
CPG freshness and automatic update
Automatic vs explicit freshness controls
Method deduplication
Delta report
Interface impact detection
Cross-module alerts
Story coverage delta
Hook Infrastructure
CLI Commands
dogfood status
dogfood analyze
dogfood report
dogfood validate-claims
dogfood trend
dogfood validate-stories
dogfood config-check
dogfood maintain-db
dogfood continue
Configuration
CommitReport
Report Format
Scaling to Other Projects
Troubleshooting

How It Works¶

The dogfooding pipeline connects three runtime pieces:

GoCPG builds and maintains a Code Property Graph (DuckDB) with pre-computed metrics for every method: cyclomatic complexity, fan-in/fan-out, TODO/FIXME flags, debug code, deprecated usage.
Dogfood CLI and review runtime query the CPG for changed methods, compute quality metrics, blast radius, interface impact, cross-module alerts, and persist review traces in data/reviews/ when recovery is needed.
Local service readiness checks surface whether supporting services are available. dogfood status reports CPG freshness, maintenance pressure, lock diagnostics, and OpenViking availability for the local development contour.

The result: recent changes can be evaluated with a traceable quality assessment without relying on local git hooks.

Usage Scenarios¶

Scenario 1: Post-commit review with explicit analysis¶

The primary local scenario. You work in Claude Code, make changes, commit, then inspect the diff explicitly:

You: "Commit these changes and analyze the result"
Claude: git add src/intent/classifier.py && git commit -m "refactor: extract pattern table"
Claude: python -m src.cli.import_commands dogfood analyze --base-ref HEAD~1

The review pipeline produces a report:

## Commit Analysis Report
**Summary:** 1 files, 45 methods, 2 high-CC, 3 TODO/FIXME, 128 affected callers
**CPG status:** fresh

**Impact of changes:**
- `_get_fallback_domain`: CC 29->8 (-21), FanOut 18->5 (-13)

**High complexity methods:**
- `classify` (CC: 17)
- `_score_domain` (CC: 16)

**Blast radius:** 128 callers affected
- `classify` called by: `run_intent_classifier`, `IntentBenchmark._evaluate_single` +126 more

**Interface changes detected:**
- **CLI**: `src/cli/intent_commands.py` (`add_intent_commands`, `_run_classify`)

**Cross-module alert** — related interfaces may need updates:
- Changed CLI → check MCP: `codegraph_intent`, `register_intent_tools`

Claude sees this context and can react: “The refactoring reduced _get_fallback_domain complexity from 29 to 8. Two methods still have CC>10: classify and _score_domain. CLI was changed — check if the MCP tool needs updating.”

What runs the analysis: explicit dogfood analyze, dogfood continue, or another runtime flow that calls the same review pipeline.

What does not run it automatically: plain terminal commits, git commit --amend, and older hook-based local flows that are no longer part of the recommended workflow.

Scenario 2: Find and fix quality issues¶

Use CPG queries to find code quality targets, then fix them with the pipeline providing feedback:

You: "Query the CPG for methods with CC > 15 and TODO/FIXME flags in src/workflow/"
Claude: [runs DuckDB query]
  Found: _get_fallback_domain (CC=29, TODO), PolicyViolationsHandler.handle (CC=68, TODO)

You: "Refactor _get_fallback_domain to reduce complexity"
Claude: [extracts patterns to data table, replaces if/else chain with loop]

You: "Commit"
Claude: git commit -m "refactor: extract fallback patterns to class-level table"
Claude: python -m src.cli.import_commands dogfood analyze --base-ref HEAD~1
  -> Review report shows: CC 29->8 (-21)

This is the full Plan-Act-Review loop: 1. Plan: CPG query identifies the problem 2. Act: Refactoring reduces complexity 3. Review: Hook confirms the improvement with concrete metrics

Scenario 3: Validate refactoring impact¶

Before making a large refactoring, check the blast radius:

You: "What's the blast radius if I change HierarchicalIntentClassifier.classify?"
Claude: [queries call_containment]
  213 direct callers across production code and tests

You: "Proceed with the refactoring"
Claude: [makes changes, commits]
Claude: python -m src.cli.import_commands dogfood analyze --base-ref HEAD~1
  -> Review report shows: 213 affected callers, CC unchanged, no regressions

The blast radius report helps gauge the risk of changes before they happen.

Scenario 4: On-demand analysis¶

Run analysis without committing:

# Analyze the last commit
python -m src.cli.import_commands dogfood analyze --base-ref HEAD~1

# Analyze changes between branches
python -m src.cli.import_commands dogfood analyze --base-ref origin/main

# Generate a full quality report
python -m src.cli.import_commands dogfood report --format markdown

# Validate numeric claims in documentation
python -m src.cli.import_commands dogfood validate-claims --path docs/

# Show quality trend across recent commits
python -m src.cli.import_commands dogfood trend --commits 20

Pipeline Architecture¶

Data flow¶

git commit
    |
    v
Explicit review run (`dogfood analyze --base-ref HEAD~1`)
    |
    +-- CPG freshness check
    |       Query cpg_git_state.commit_hash, compare to git rev-parse HEAD
    |
    +-- Pre-update metrics capture (for delta report)
    |       Query nodes_method for changed files BEFORE CPG update
    |       Store {full_name: {cc, fan_out, ...}} for later comparison
    |
    +-- CPG update if stale
    |       gocpg update --input=<source> --output=<db>
    |
    +-- Phase 1: Get changed files
    |       git diff --name-only HEAD~1 HEAD, filter code extensions
    |
    +-- Phase 2: Get changed methods from CPG
    |       Query nodes_method for changed files, deduplicate
    |
    +-- Phase 3: Quality summary
    |       Compute high-CC, high-fan_out, TODO, debug, deprecated counts
    |
    +-- Phase 4: Blast radius
    |       Query call_containment (or nodes_call fallback) for callers
    |
    +-- Phase 5: Interface impact detection
    |       Check if changed files belong to interface layers (CLI, REST API, MCP, ACP)
    |
    +-- Phase 6: Cross-module alerts
    |       Find related functions in OTHER interface layers by keyword matching
    |
    +-- Phase 7: Story coverage delta
    |       Flag layers that changed vs layers not covered
    |
    +-- Record quality snapshot (cpg_quality_history table)
    |
    +-- Output: {"additionalContext": "## Commit Analysis Report\n..."}
            Injected back into Claude Code conversation

Timeout budget¶

The review runtime is budgeted to keep interactive runs bounded:

Phase	Budget	Action
Freshness check	2s	Compare `cpg_git_state.commit_hash` to `git rev-parse HEAD`
Pre-update metrics	~1s	Query current metrics for changed files (for delta report)
CPG update if stale	40s	Run `gocpg update --input=<source> --output=<db>`
Phases 1–7	~15s	Changed files, methods, quality, blast radius, interfaces, cross-module, story

If any phase exceeds its budget, the runtime degrades gracefully: it produces whatever data it has or returns empty {}.

CPG freshness and automatic update¶

The runtime checks CPG freshness by comparing cpg_git_state.commit_hash to git rev-parse HEAD. If stale, it runs gocpg update:

gocpg update --input=<source_path> --output=<db_path>

Hook Infrastructure¶

The legacy hook/runtime support code is still relevant as implementation detail even though the recommended workflow is now explicit dogfood analyze / dogfood continue execution instead of background post-commit hooks.

Key helper modules in src/dogfooding/hooks/:

_feedback.py maps pipeline results into ReviewFinding and ReviewFeedback records.
_metrics.py exposes hook_metrics counters and timing helpers for local observability.
_utils.py provides shared helpers such as timed_hook wrappers and runtime-safe formatting.
_session_cache.py manages session_cache state used to correlate repeated local review runs.

These internals matter when you troubleshoot degraded hook behavior, inspect fallback paths, or compare the current explicit runtime against older hook-based experiments.

This triggers an incremental update of the CPG database. The CPGFreshnessChecker class in src/dogfooding/cpg_freshness.py manages this:

from src.dogfooding.cpg_freshness import CPGFreshnessChecker

checker = CPGFreshnessChecker(db_path, repo_path=".", gocpg_binary="gocpg/gocpg.exe")
checker.is_fresh()           # True if CPG commit == HEAD
checker.commits_behind()     # Number of commits CPG is behind
checker.ensure_fresh(timeout=40.0, source_path=".")  # Update if stale
checker.status()             # Full status dict

Freshness checks now include git-head fallback logic for environments where git rev-parse HEAD is unreliable in subprocesses. The checker resolves HEAD from .git/HEAD, refs, and packed-refs (including worktree indirection) before returning head_commit as unknown.

When update fails with DuckDB lock contention, detailed diagnostics include lock classification, lock-holder PIDs, optional auto-unlock attempt results, and actionable next_step / next_command guidance.

Freshness is reported in two forms: - is_fresh_strict: exact commit match (cpg_commit == head_commit) - is_fresh: effective freshness (strict match OR no CPG-relevant file changes between commits, e.g. docs-only commits)

Automatic vs explicit freshness controls¶

Mechanism	Trigger	Typical use
`dogfood status`	Explicit/manual	Inspect freshness, lock diagnostics, maintenance due, and OpenViking readiness
`dogfood analyze`	Explicit/manual	Run bounded post-commit or branch-diff review
`codegraph_runtime_cpg_watch_run check` / `codegraph_runtime_cpg_watch_run update`	Explicit/manual	Deterministic freshness checks in headless and CI
`CPGFreshnessChecker.ensure_fresh_with_details()`	Explicit/manual	Programmatic control and machine-readable failure diagnostics

If your workflow depends on guaranteed freshness before analysis, prefer explicit codegraph_runtime_cpg_watch_run update or dogfood status over any legacy background automation.

Method deduplication¶

GoCPG may store the same method with different filename formats (forward slash src/file.py vs backslash src\file.py). The analyzer deduplicates by normalizing full_name slashes and keeping the entry with the highest CC value:

# Before dedup: 2 entries for the same method
src\intent\classifier.py:Classifier.classify  CC=17
src/intent/classifier.py:Classifier.classify   CC=0  (from incremental update)

# After dedup: 1 entry, highest CC wins
src\intent\classifier.py:Classifier.classify  CC=17

Delta report¶

When the CPG is stale (needs update), the hook captures pre-update metrics before running gocpg update, then compares against post-update metrics. This produces a delta showing the actual impact of changes:

**Impact of changes:**
- `_get_fallback_domain`: CC 29->8 (-21), FanOut 18->5 (-13)
- `_build_quality_summary`: CC 5->3 (-2)

Methods with no metric changes are omitted.

Interface impact detection¶

The analyzer tracks 4 interface layers defined in INTERFACE_LAYERS:

Layer	Path Patterns	Description
CLI	`src/cli/`	CLI commands
REST API	`src/api/routers/`	API endpoints
MCP	`src/mcp/tools/`, `src/mcp/`	MCP tools
ACP	`src/acp/server/`, `src/acp/`	ACP handlers

When a changed file belongs to an interface layer, the report includes an “Interface changes detected” section listing affected layers and methods.

Cross-module alerts¶

When a file in one interface layer changes, the analyzer searches for related functions in OTHER layers by extracting keywords from the changed filename and querying the CPG. For example, if src/cli/reindex_commands.py changes, it looks for functions with “reindex” in their name in MCP, REST API, etc.

The report includes a “Cross-module alert” section suggesting which layers to check:

**Cross-module alert** — related interfaces may need updates:
- Changed CLI → check MCP: `codegraph_reindex`, `register_reindex_tools`

Story coverage delta¶

When interface layers are changed, the analyzer flags which other layers may need updates to maintain feature parity. The report includes a “Story coverage check” section:

**Story coverage check** — verify other interfaces:
- CLI changed (`reindex`), check: MCP, REST API, ACP

Runtime Checks¶

dogfood status is the primary readiness probe for the current local workflow. It reports:

CPG freshness and commit lag
maintenance pressure for the DuckDB file
lock diagnostics and recovery guidance
persisted review trace state from data/reviews/
OpenViking availability for the local development stack

CLI Commands¶

All commands are accessed via python -m src.cli.import_commands dogfood <subcommand>.

dogfood status¶

Check CPG freshness, review state, and local runtime readiness:

python -m src.cli.import_commands dogfood status [--db PATH]

dogfood analyze¶

Run commit analysis on demand:

python -m src.cli.import_commands dogfood analyze [--base-ref HEAD~1] [--db PATH]

dogfood report¶

Generate quality report (markdown or JSON):

python -m src.cli.import_commands dogfood report [--format markdown|json] [--db PATH]

dogfood validate-claims¶

Validate numeric claims in documentation against the CPG. Extracts numbers from markdown (e.g., “95 handlers”, “12 scenarios”) and verifies via SQL:

python -m src.cli.import_commands dogfood validate-claims [--path PATH] [--db PATH]

Claim rules are defined in config.yaml → dogfooding.claims_validation.rules[]. Each rule maps keywords (English + Russian) to a SQL query:

claims_validation:
  enabled: true
  timeout: 5.0
  rules:
    - keywords: ["handlers", "обработчиков"]
      sql: "SELECT COUNT(DISTINCT full_name) FROM nodes_method WHERE ..."
      description: "Scenario handler methods"

dogfood trend¶

Show quality trend across recent commits from the cpg_quality_history table:

python -m src.cli.import_commands dogfood trend [--commits N] [--db PATH]

Output is an ASCII table with columns: Commit, Date, Methods, Avg CC, Dead, Hi-CC, TODO.

Quality snapshots are recorded automatically after each commit analysis via record_snapshot() in src/dogfooding/quality_history.py.

dogfood validate-stories¶

Validate user story interface coverage via CPG using StoryValidationRunner:

python -m src.cli.import_commands dogfood validate-stories [--stories 2,8,11] [--path FILE] [--output FILE] [--db PATH] [--go-db PATH]

dogfood config-check¶

Detect orphan configuration parameters by cross-referencing YAML config, schema, and code usage:

python -m src.cli.import_commands dogfood config-check [--format text|json|csv] [--level error|warning|info|all] [--fix-suggestions] [--config PATH] [--schema PATH] [--source DIR...]

Parameter	Default	Description
`--format`	`text`	Output format: `text`, `json`, or `csv`
`--level`	`all`	Minimum severity level to show
`--fix-suggestions`	off	Show fix suggestions for each finding
`--config`	`config.yaml`	Path to YAML config file
`--schema`	`src/config/unified_config.py`	Path to schema file
`--source`	`src/`	Source directories to scan (multiple allowed)

Detects 6 orphan types: yaml_unused, yaml_missing, code_orphan, path_mismatch, orphaned_dataclass, unused_default. Uses ConfigOrphanAnalyzer from src/analysis/config_analyzer.py.

dogfood maintain-db¶

Perform routine CPG maintenance and cleanup:

python -m src.cli.import_commands dogfood maintain-db [--db PATH] [--force] [--json]

Parameter	Default	Description
`--db`	auto-detected	DuckDB database path
`--force`	off	Continue even when the command detects a risky state
`--json`	off	Return machine-readable maintenance details

Use this command when quality history tables, review traces, or stale maintenance markers need a controlled cleanup step.

dogfood continue¶

Resume an interrupted dogfooding workflow from the stored review state:

python -m src.cli.import_commands dogfood continue [--db PATH] [--review-dir PATH] [--json]

Parameter	Default	Description
`--db`	auto-detected	DuckDB database path
`--review-dir`	`data/reviews`	Directory with persisted review state
`--json`	off	Return machine-readable status for automation

This is the recovery path when a post-commit review was interrupted and you want to continue from the last saved checkpoint instead of starting over.

Configuration¶

In config.yaml:

dogfooding:
  enabled: true
  auto_update_cpg: true          # Run gocpg update if CPG is stale
  cpg_update_timeout: 40         # Seconds for CPG update
  analysis_timeout: 16           # Seconds for quality + blast radius
  cc_threshold: 10               # Flag methods with CC above this
  fan_out_threshold: 30          # Flag methods with fan_out above this
  blast_radius_depth: 2          # Max depth for caller traversal
  max_files_per_commit: 15       # Max files to analyze per commit
  report_format: markdown        # markdown or json
  record_quality_history: true   # Record QualitySnapshot per commit
  quality_history_db_path: data/quality_history.duckdb  # Optional separate DB for snapshots
  include_paths:                 # Limit dogfooding to selected source roots
    - src
    - tests
  exclude_paths:                 # Skip generated or third-party code
    - .venv
    - node_modules
  claims_validation:
    enabled: true
    timeout: 5.0                 # Seconds per claim query
    rules:                       # Keyword→SQL mappings for validate-claims
      - keywords: ["handlers", "обработчиков"]
        sql: "SELECT COUNT(...) FROM nodes_method WHERE ..."
        description: "Scenario handler methods"

CommitReport¶

The CommitReport dataclass (src/dogfooding/commit_analyzer.py) holds the full analysis result:

Field	Type	Description
`changed_files`	`List[str]`	Code files changed in the commit
`changed_methods`	`List[dict]`	Methods in changed files (deduplicated)
`blast_radius`	`Dict`	`{"callers": {method: [callers]}, "total_affected": N}`
`quality_summary`	`Dict`	High-CC, high-fan_out, TODO, debug, deprecated counts
`interface_impacts`	`List[dict]`	Interface layers affected (CLI, REST API, MCP, ACP)
`cross_module_alerts`	`List[dict]`	Related functions in other interface layers
`story_coverage_delta`	`List[dict]`	Story coverage gaps across layers
`is_cpg_fresh`	`bool`	Whether CPG was up-to-date
`analysis_time_ms`	`int`	Total analysis time in milliseconds
`deltas`	`List[dict]`	Before→after metric changes

Report Format¶

The review pipeline returns a markdown report as additionalContext in JSON:

{"additionalContext": "## Commit Analysis Report\n**Summary:** ..."}

Full report structure (sections are omitted when empty):

## Commit Analysis Report
**Summary:** 3 files, 45 methods, 2 high-CC, 1 TODO/FIXME, 128 affected callers
**CPG status:** fresh

**Impact of changes:**
- `_get_fallback_domain`: CC 29->8 (-21), FanOut 18->5 (-13)

**High complexity methods:**
- `classify` (CC: 17)
- `_score_domain` (CC: 16)

**High fan-out methods:**
- `classify` (fan_out: 39)

**Blast radius:** 128 callers affected
- `classify` called by: `run_intent_classifier`, `IntentBenchmark._evaluate_single` +126 more
- `_classify_domain` called by: `classify`, `get_morph` +4 more

**Interface changes detected:**
- **CLI**: `src/cli/intent_commands.py` (`add_intent_commands`, `_run_classify`)

**Cross-module alert** — related interfaces may need updates:
- Changed CLI → check MCP: `codegraph_intent`, `register_intent_tools`

**Story coverage check** — verify other interfaces:
- CLI changed (`intent`), check: MCP, REST API, ACP

*Analysis completed in 95ms*

Scaling to Other Projects¶

The dogfooding pipeline is project-agnostic. To set up for any project:

Import the project to create a CPG database: bash python -m src.cli import /path/to/project --language python
Register the project in config.yaml: yaml projects: active: my_project registry: my_project: db_path: data/projects/my_project.duckdb source_path: /path/to/project language: python domain: python_generic
Verify the local runtime: bash python -m src.cli.import_commands dogfood status --db data/projects/my_project.duckdb python -m src.cli.import_commands dogfood analyze --base-ref HEAD~1 --db data/projects/my_project.duckdb

The dogfood commands read the active project from config.yaml and resolve the correct database path automatically.

Troubleshooting¶

Analysis produces empty {} output: - Check that the database file exists at the configured db_path - Verify the active project in config.yaml has a valid db_path - Run python -m src.cli.import_commands dogfood status to check freshness - Ensure the commit changed code files (.py, .go, .c, etc.), not just docs or configs

CPG always shows stale: - Ensure gocpg binary exists at gocpg/gocpg.exe (or the configured GOCPG_PATH) - Run python -m src.cli.import_commands dogfood status and inspect recommended_next_action - Try manual update: gocpg/gocpg.exe update --input=. --output=<db>

CC values are 0 after incremental update: - Incremental gocpg update may skip MethodMetricsPass for some entries. New entries can have cyclomatic_complexity=0. - The deduplication logic keeps the entry with the highest CC value, mitigating this. - If persistent, re-import the project from scratch: python -m src.cli import /path/to/source

DuckDB lock error (“file is being used by another process”): - Another gocpg.exe process is running (for example from gocpg watch or a concurrent refresh). - The runtime uses read-only connections and handles lock errors gracefully, falling back to subprocess queries. - codegraph_runtime_cpg_watch_run update / ensure_fresh_with_details() return lock diagnostics (failure_kind=db_lock, locker_pids, auto_unlock_*, next_command) to speed up recovery. - If the locker PID is the current Python process, auto-unlock intentionally skips killing itself; run the suggested next_command after closing the locker.

Delta report not appearing: - The delta report only appears when the CPG was stale before the update (pre-update metrics were captured). - If the CPG is already fresh (e.g., gocpg watch updated it), there are no pre-update metrics to compare against.

Timeout exceeded: - The 58s budget (60s Claude Code limit minus 2s margin) accommodates most commits. For very large projects, gocpg update may exceed the 40s phase budget. - Reduce max_files_per_commit in config. - Ensure GoCPG indexes are up to date: gocpg/gocpg.exe index --db=<db>

OpenViking is missing from status: - Start the local stack and confirm the OpenViking service is listening on the configured port. - Run python -m src.cli.import_commands dogfood status again and inspect the openviking_status section.