Dogfooding Guide: CPG-Powered Commit Analysis

CodeGraph analyzes its own codebase through the Code Property Graph after every commit, creating a Plan-Act-Review feedback loop. Claude Code receives quality metrics, blast radius data, interface impact analysis, and before/after comparison as context immediately after committing code.

Table of Contents

How It Works

The dogfooding pipeline connects three systems:

  1. GoCPG builds and maintains a Code Property Graph (DuckDB) with pre-computed metrics for every method: cyclomatic complexity, fan-in/fan-out, TODO/FIXME flags, debug code, deprecated usage.

  2. Git hooks trigger CPG updates after each commit, keep the database synchronized with the codebase, and can persist traced review artifacts per commit.

  3. Claude Code hooks fire after git commit commands, query the CPG for changed methods, compute quality metrics, blast radius, interface impact and cross-module alerts, then inject the report back into the conversation as additional context.

The result: every commit can produce a visible, traceable quality assessment without leaving the IDE or running separate tools.

Usage Scenarios

Scenario 1: Automatic post-commit feedback

The primary scenario. You work in Claude Code, make changes, commit:

You: "Commit these changes"
Claude: git add src/intent/classifier.py && git commit -m "refactor: extract pattern table"

The PostToolUse hook fires automatically and injects a report:

## Commit Analysis Report
**Summary:** 1 files, 45 methods, 2 high-CC, 3 TODO/FIXME, 128 affected callers
**CPG status:** fresh

**Impact of changes:**
- `_get_fallback_domain`: CC 29->8 (-21), FanOut 18->5 (-13)

**High complexity methods:**
- `classify` (CC: 17)
- `_score_domain` (CC: 16)

**Blast radius:** 128 callers affected
- `classify` called by: `run_intent_classifier`, `IntentBenchmark._evaluate_single` +126 more

**Interface changes detected:**
- **CLI**: `src/cli/intent_commands.py` (`add_intent_commands`, `_run_classify`)

**Cross-module alert** — related interfaces may need updates:
- Changed CLI → check MCP: `codegraph_intent`, `register_intent_tools`

Claude sees this context and can react: “The refactoring reduced _get_fallback_domain complexity from 29 to 8. Two methods still have CC>10: classify and _score_domain. CLI was changed — check if the MCP tool needs updating.”

What triggers the hook: Any git commit command executed via the Bash tool. The hook detects "git commit" in the command string. Non-commit Bash commands (e.g., git status, ls) are ignored.

What does NOT trigger it: Direct terminal commits outside Claude Code, git commit --amend, or commits via other tools.

Scenario 1b: Durable post-commit trace outside the chat loop

For terminal commits, CI-like local workflows, or any session where you want a durable audit trail, enable GoCPG review tracing:

gocpg/gocpg.exe hooks install --repo=. --db=data/projects/codegraph.duckdb --review-trace
python scripts/review_trace_status.py <commit-sha> --watch

Each commit writes these files to data/reviews/:

  • <sha>.log — chronological worker log with heartbeats
  • <sha>.status.json — current phase and progress snapshot
  • <sha>.meta.json — final outcome (gocpg_update_ok, db_unlocked_after_update, review status)
  • <sha>.json / <sha>.md — machine-readable and human-readable review output

The traced worker narrows review scope to changed files, so post-commit review usually completes in seconds rather than rescanning the full diff base.

Scenario 2: Find and fix quality issues

Use CPG queries to find code quality targets, then fix them with the pipeline providing feedback:

You: "Query the CPG for methods with CC > 15 and TODO/FIXME flags in src/workflow/"
Claude: [runs DuckDB query]
  Found: _get_fallback_domain (CC=29, TODO), PolicyViolationsHandler.handle (CC=68, TODO)

You: "Refactor _get_fallback_domain to reduce complexity"
Claude: [extracts patterns to data table, replaces if/else chain with loop]

You: "Commit"
Claude: git commit -m "refactor: extract fallback patterns to class-level table"
  -> Hook fires, report shows: CC 29->8 (-21)

This is the full Plan-Act-Review loop: 1. Plan: CPG query identifies the problem 2. Act: Refactoring reduces complexity 3. Review: Hook confirms the improvement with concrete metrics

Scenario 3: Validate refactoring impact

Before making a large refactoring, check the blast radius:

You: "What's the blast radius if I change HierarchicalIntentClassifier.classify?"
Claude: [queries call_containment]
  213 direct callers across production code and tests

You: "Proceed with the refactoring"
Claude: [makes changes, commits]
  -> Hook shows: 213 affected callers, CC unchanged, no regressions

The blast radius report helps gauge the risk of changes before they happen.

Scenario 4: On-demand analysis

Run analysis without committing:

# Analyze the last commit
python -m src.cli.import_commands dogfood analyze --base-ref HEAD~1

# Analyze changes between branches
python -m src.cli.import_commands dogfood analyze --base-ref origin/main

# Generate a full quality report
python -m src.cli.import_commands dogfood report --format markdown

# Validate numeric claims in documentation
python -m src.cli.import_commands dogfood validate-claims --path docs/

# Show quality trend across recent commits
python -m src.cli.import_commands dogfood trend --commits 20

Pipeline Architecture

Data flow

git commit (via Bash tool in Claude Code)
    |
    v
PostToolUse hook fires (.claude/hooks/commit_analysis.py, 58s timeout)
    |
    +-- CPG freshness check
    |       Query cpg_git_state.commit_hash, compare to git rev-parse HEAD
    |
    +-- Pre-update metrics capture (for delta report)
    |       Query nodes_method for changed files BEFORE CPG update
    |       Store {full_name: {cc, fan_out, ...}} for later comparison
    |
    +-- CPG update if stale
    |       gocpg update --input=<source> --output=<db>
    |
    +-- Phase 1: Get changed files
    |       git diff --name-only HEAD~1 HEAD, filter code extensions
    |
    +-- Phase 2: Get changed methods from CPG
    |       Query nodes_method for changed files, deduplicate
    |
    +-- Phase 3: Quality summary
    |       Compute high-CC, high-fan_out, TODO, debug, deprecated counts
    |
    +-- Phase 4: Blast radius
    |       Query call_containment (or nodes_call fallback) for callers
    |
    +-- Phase 5: Interface impact detection
    |       Check if changed files belong to interface layers (CLI, REST API, MCP, ACP)
    |
    +-- Phase 6: Cross-module alerts
    |       Find related functions in OTHER interface layers by keyword matching
    |
    +-- Phase 7: Story coverage delta
    |       Flag layers that changed vs layers not covered
    |
    +-- Record quality snapshot (cpg_quality_history table)
    |
    +-- Output: {"additionalContext": "## Commit Analysis Report\n..."}
            Injected back into Claude Code conversation

Timeout budget

The hook has a 58-second total timeout (2s margin from the 60s Claude Code limit):

Phase Budget Action
Freshness check 2s Compare cpg_git_state.commit_hash to git rev-parse HEAD
Pre-update metrics ~1s Query current metrics for changed files (for delta report)
CPG update if stale 40s Run gocpg update --input=<source> --output=<db>
Phases 1–7 ~15s Changed files, methods, quality, blast radius, interfaces, cross-module, story

If any phase exceeds its budget, the hook degrades gracefully: it produces whatever data it has or returns empty {}.

CPG freshness and automatic update

The hook checks CPG freshness by comparing cpg_git_state.commit_hash to git rev-parse HEAD. If stale, it runs gocpg update:

gocpg update --input=<source_path> --output=<db_path>

This triggers an incremental update of the CPG database. The CPGFreshnessChecker class in src/dogfooding/cpg_freshness.py manages this:

from src.dogfooding.cpg_freshness import CPGFreshnessChecker

checker = CPGFreshnessChecker(db_path, repo_path=".", gocpg_binary="gocpg/gocpg.exe")
checker.is_fresh()           # True if CPG commit == HEAD
checker.commits_behind()     # Number of commits CPG is behind
checker.ensure_fresh(timeout=40.0, source_path=".")  # Update if stale
checker.status()             # Full status dict

Freshness checks now include git-head fallback logic for environments where git rev-parse HEAD is unreliable in subprocesses. The checker resolves HEAD from .git/HEAD, refs, and packed-refs (including worktree indirection) before returning head_commit as unknown.

When update fails with DuckDB lock contention, detailed diagnostics include lock classification, lock-holder PIDs, optional auto-unlock attempt results, and actionable next_step / next_command guidance.

Freshness is reported in two forms: - is_fresh_strict: exact commit match (cpg_commit == head_commit) - is_fresh: effective freshness (strict match OR no CPG-relevant file changes between commits, e.g. docs-only commits)

Automatic vs explicit freshness controls

Mechanism Trigger Typical use
Commit hook (.claude/hooks/commit_analysis.py) Automatic on git commit via Bash tool Normal plan/act/review loop after each commit
Plugin post-commit webhook trigger Automatic best-effort OpenCode plugin workflows where API is running
codegraph_watch check / codegraph_watch update Explicit/manual Deterministic freshness checks in headless and CI
CPGFreshnessChecker.ensure_fresh_with_details() Explicit/manual Programmatic control and machine-readable failure diagnostics

If your workflow depends on guaranteed freshness before analysis, prefer explicit codegraph_watch update over waiting for background hooks.

Method deduplication

GoCPG may store the same method with different filename formats (forward slash src/file.py vs backslash src\file.py). The analyzer deduplicates by normalizing full_name slashes and keeping the entry with the highest CC value:

# Before dedup: 2 entries for the same method
src\intent\classifier.py:Classifier.classify  CC=17
src/intent/classifier.py:Classifier.classify   CC=0  (from incremental update)

# After dedup: 1 entry, highest CC wins
src\intent\classifier.py:Classifier.classify  CC=17

Delta report

When the CPG is stale (needs update), the hook captures pre-update metrics before running gocpg update, then compares against post-update metrics. This produces a delta showing the actual impact of changes:

**Impact of changes:**
- `_get_fallback_domain`: CC 29->8 (-21), FanOut 18->5 (-13)
- `_build_quality_summary`: CC 5->3 (-2)

Methods with no metric changes are omitted.

Interface impact detection

The analyzer tracks 4 interface layers defined in INTERFACE_LAYERS:

Layer Path Patterns Description
CLI src/cli/ CLI commands
REST API src/api/routers/ API endpoints
MCP src/mcp/tools/, src/mcp/ MCP tools
ACP src/acp/server/, src/acp/ ACP handlers

When a changed file belongs to an interface layer, the report includes an “Interface changes detected” section listing affected layers and methods.

Cross-module alerts

When a file in one interface layer changes, the analyzer searches for related functions in OTHER layers by extracting keywords from the changed filename and querying the CPG. For example, if src/cli/reindex_commands.py changes, it looks for functions with “reindex” in their name in MCP, REST API, etc.

The report includes a “Cross-module alert” section suggesting which layers to check:

**Cross-module alert** — related interfaces may need updates:
- Changed CLI → check MCP: `codegraph_reindex`, `register_reindex_tools`

Story coverage delta

When interface layers are changed, the analyzer flags which other layers may need updates to maintain feature parity. The report includes a “Story coverage check” section:

**Story coverage check** — verify other interfaces:
- CLI changed (`reindex`), check: MCP, REST API, ACP

Hook Infrastructure

Hook modules

The .claude/hooks/ directory contains the commit analysis hook and 4 utility modules:

Module Purpose
commit_analysis.py PostToolUse hook — commit analysis (58s timeout)
_feedback.py ReviewFinding and ReviewFeedback dataclasses
_metrics.py Hook telemetry — log_hook_metric(), timed_hook() context manager
_utils.py Shared utilities — get_active_project(), run_gocpg_query(), safe_json_output(), load_parse_scope()
_session_cache.py Project cache with 1-hour TTL

ReviewFinding and ReviewFeedback

Structured feedback protocol for Claude Code hooks:

@dataclass
class ReviewFinding:
    file: str = ""
    line: Optional[int] = None
    type: str = ""          # "complexity", "dead_code", "blast_radius", "interface_gap", etc.
    severity: str = "info"  # "critical", "high", "medium", "low", "info"
    message: str = ""
    suggestion: Optional[str] = None
    metric_value: Optional[float] = None
    scope_limited: bool = False  # True if finding may be affected by partial scope

@dataclass
class ReviewFeedback:
    status: str = "ok"      # "ok", "warning", "block"
    hook_name: str = ""
    project: Optional[str] = None
    cpg_status: Optional[str] = None
    findings: List[ReviewFinding] = field(default_factory=list)
    summary: str = ""
    duration_ms: float = 0.0
    scope_disclaimer: str = ""
    suppressed_count: int = 0

    def to_markdown(self) -> str: ...
    def output(self): ...  # Writes to stdout as additionalContext JSON

Hook metrics

_metrics.py records execution telemetry in data/hook_metrics.jsonl (JSONL format, one JSON object per line):

{"timestamp": "2026-03-07T12:34:56.789Z", "hook": "commit_analysis", "duration_ms": 3250.5, "findings": 5, "project": "codegraph", "cpg_status": "fresh", "status": "ok"}

The timed_hook() context manager automatically measures duration and logs the result:

from _metrics import timed_hook

with timed_hook("commit_analysis") as mctx:
    mctx["project"] = "codegraph"
    mctx["findings_count"] = 5
    # duration logged automatically on exit

View aggregated stats via CLI: python -m src.cli.import_commands dogfood hooks-status.

Session cache

_session_cache.py caches the active project in .claude/.cache/session_project.json with a 1-hour TTL to avoid resolving the project on every hook invocation:

from _session_cache import get_project_with_cache, invalidate_cache

project = get_project_with_cache()  # Returns cached or resolves fresh
invalidate_cache()                   # Force re-resolution

Setup

One-command setup

python -m src.cli.import_commands dogfood setup --repo . --db data/projects/codegraph.duckdb

This installs git hooks (via gocpg) and configures the Claude Code hook.

Note: The setup command writes the hook under the SubcommandResult key in .claude/settings.json. The actual production configuration uses PostToolUse with "matcher": "Bash". If setup was run, you may need to manually adjust the hook key.

Manual setup

  1. Install git hooks (background CPG update on commit): bash gocpg/gocpg.exe hooks install --repo=. --db=data/projects/codegraph.duckdb

  2. Configure Claude Code hooks in .claude/settings.json: json { "hooks": { "PostToolUse": [{ "matcher": "Bash", "hooks": [{ "type": "command", "command": "python .claude/hooks/commit_analysis.py", "timeout": 60000 }] }] } }

Note: The matcher field must be a string (regex pattern), not an object. "Bash" matches the Bash tool specifically.

The full production .claude/settings.json includes 5 hooks across all lifecycle points:

Lifecycle Point Hook Timeout
UserPromptSubmit enrich_prompt.py 15s
SessionStart session_context.py 10s
Stop post_analysis.py 10s
PreToolUse pre_tool_use.py 8s
PostToolUse (Bash) cli_error_monitor.py + commit_analysis.py 5s + 60s

Verify setup

python -m src.cli.import_commands dogfood status

Expected output shows CPG freshness, hook status, and database path.

CLI Commands

All commands are accessed via python -m src.cli.import_commands dogfood <subcommand>.

dogfood setup

Install git hooks and configure Claude Code hooks:

python -m src.cli.import_commands dogfood setup [--repo PATH] [--db PATH] [--language LANG]

dogfood status

Check CPG freshness and hook status:

python -m src.cli.import_commands dogfood status [--db PATH]

dogfood analyze

Run commit analysis on demand:

python -m src.cli.import_commands dogfood analyze [--base-ref HEAD~1] [--db PATH]

dogfood report

Generate quality report (markdown or JSON):

python -m src.cli.import_commands dogfood report [--format markdown|json] [--db PATH]

dogfood validate-claims

Validate numeric claims in documentation against the CPG. Extracts numbers from markdown (e.g., “95 handlers”, “12 scenarios”) and verifies via SQL:

python -m src.cli.import_commands dogfood validate-claims [--path PATH] [--db PATH]

Claim rules are defined in config.yamldogfooding.claims_validation.rules[]. Each rule maps keywords (English + Russian) to a SQL query:

claims_validation:
  enabled: true
  timeout: 5.0
  rules:
    - keywords: ["handlers", "обработчиков"]
      sql: "SELECT COUNT(DISTINCT full_name) FROM nodes_method WHERE ..."
      description: "Scenario handler methods"

dogfood trend

Show quality trend across recent commits from the cpg_quality_history table:

python -m src.cli.import_commands dogfood trend [--commits N] [--db PATH]

Output is an ASCII table with columns: Commit, Date, Methods, Avg CC, Dead, Hi-CC, TODO.

Quality snapshots are recorded automatically after each commit analysis via record_snapshot() in src/dogfooding/quality_history.py.

dogfood hooks-status

Show hook execution statistics from the JSONL metrics log (data/hook_metrics.jsonl):

python -m src.cli.import_commands dogfood hooks-status [--last N] [--hook NAME]

Aggregates by hook name: runs, ok/error/warning counts, fail%, avg/max duration, total findings.

dogfood validate-stories

Validate user story interface coverage via CPG using StoryValidationRunner:

python -m src.cli.import_commands dogfood validate-stories [--stories 2,8,11] [--path FILE] [--output FILE] [--db PATH] [--go-db PATH]

dogfood config-check

Detect orphan configuration parameters by cross-referencing YAML config, schema, and code usage:

python -m src.cli.import_commands dogfood config-check [--format text|json|csv] [--level error|warning|info|all] [--fix-suggestions] [--config PATH] [--schema PATH] [--source DIR...]
Parameter Default Description
--format text Output format: text, json, or csv
--level all Minimum severity level to show
--fix-suggestions off Show fix suggestions for each finding
--config config.yaml Path to YAML config file
--schema src/config/unified_config.py Path to schema file
--source src/ Source directories to scan (multiple allowed)

Detects 6 orphan types: yaml_unused, yaml_missing, code_orphan, path_mismatch, orphaned_dataclass, unused_default. Uses ConfigOrphanAnalyzer from src/analysis/config_analyzer.py.

dogfood maintain-db

Perform routine CPG maintenance and cleanup:

python -m src.cli.import_commands dogfood maintain-db [--db PATH] [--vacuum] [--force]
Parameter Default Description
--db auto-detected DuckDB database path
--vacuum off Run DuckDB VACUUM after maintenance
--force off Continue even when the command detects a risky state

Use this command when quality history tables, review traces, or stale maintenance markers need a controlled cleanup step.

dogfood continue

Resume an interrupted dogfooding workflow from the stored review state:

python -m src.cli.import_commands dogfood continue [--db PATH] [--review-dir PATH] [--json]
Parameter Default Description
--db auto-detected DuckDB database path
--review-dir data/reviews Directory with persisted review state
--json off Return machine-readable status for automation

This is the recovery path when a post-commit review was interrupted and you want to continue from the last saved checkpoint instead of starting over.

dogfood unlock-db

Release a blocked DuckDB database after lock contention diagnostics:

python -m src.cli.import_commands dogfood unlock-db [--db PATH] [--force] [--json]
Parameter Default Description
--db auto-detected DuckDB database path
--force off Attempt the unlock flow even when the detector is not fully certain
--json off Return machine-readable diagnostics

Use this when dogfood status, dogfood analyze, or the post-commit hook report DuckDB lock contention and recommend an explicit unlock step.

Configuration

In config.yaml:

dogfooding:
  enabled: true
  auto_update_cpg: true          # Run gocpg update if CPG is stale
  cpg_update_timeout: 40         # Seconds for CPG update
  analysis_timeout: 16           # Seconds for quality + blast radius
  cc_threshold: 10               # Flag methods with CC above this
  fan_out_threshold: 30          # Flag methods with fan_out above this
  blast_radius_depth: 2          # Max depth for caller traversal
  max_files_per_commit: 15       # Max files to analyze per commit
  report_format: markdown        # markdown or json
  record_quality_history: true   # Record QualitySnapshot per commit
  quality_history_db_path: data/quality_history.duckdb  # Optional separate DB for snapshots
  include_paths:                 # Limit dogfooding to selected source roots
    - src
    - tests
  exclude_paths:                 # Skip generated or third-party code
    - .venv
    - node_modules
  claims_validation:
    enabled: true
    timeout: 5.0                 # Seconds per claim query
    rules:                       # Keyword→SQL mappings for validate-claims
      - keywords: ["handlers", "обработчиков"]
        sql: "SELECT COUNT(...) FROM nodes_method WHERE ..."
        description: "Scenario handler methods"

CommitReport

The CommitReport dataclass (src/dogfooding/commit_analyzer.py) holds the full analysis result:

Field Type Description
changed_files List[str] Code files changed in the commit
changed_methods List[dict] Methods in changed files (deduplicated)
blast_radius Dict {"callers": {method: [callers]}, "total_affected": N}
quality_summary Dict High-CC, high-fan_out, TODO, debug, deprecated counts
interface_impacts List[dict] Interface layers affected (CLI, REST API, MCP, ACP)
cross_module_alerts List[dict] Related functions in other interface layers
story_coverage_delta List[dict] Story coverage gaps across layers
is_cpg_fresh bool Whether CPG was up-to-date
analysis_time_ms int Total analysis time in milliseconds
deltas List[dict] Before→after metric changes

Report Format

The hook returns a markdown report as additionalContext in JSON:

{"additionalContext": "## Commit Analysis Report\n**Summary:** ..."}

Full report structure (sections are omitted when empty):

## Commit Analysis Report
**Summary:** 3 files, 45 methods, 2 high-CC, 1 TODO/FIXME, 128 affected callers
**CPG status:** fresh

**Impact of changes:**
- `_get_fallback_domain`: CC 29->8 (-21), FanOut 18->5 (-13)

**High complexity methods:**
- `classify` (CC: 17)
- `_score_domain` (CC: 16)

**High fan-out methods:**
- `classify` (fan_out: 39)

**Blast radius:** 128 callers affected
- `classify` called by: `run_intent_classifier`, `IntentBenchmark._evaluate_single` +126 more
- `_classify_domain` called by: `classify`, `get_morph` +4 more

**Interface changes detected:**
- **CLI**: `src/cli/intent_commands.py` (`add_intent_commands`, `_run_classify`)

**Cross-module alert** — related interfaces may need updates:
- Changed CLI → check MCP: `codegraph_intent`, `register_intent_tools`

**Story coverage check** — verify other interfaces:
- CLI changed (`intent`), check: MCP, REST API, ACP

*Analysis completed in 95ms*

Scaling to Other Projects

The dogfooding pipeline is project-agnostic. To set up for any project:

  1. Import the project to create a CPG database: bash python -m src.cli import /path/to/project --language python

  2. Register the project in config.yaml: yaml projects: active: my_project registry: my_project: db_path: data/projects/my_project.duckdb source_path: /path/to/project language: python domain: python_generic

  3. Run setup: bash python -m src.cli.import_commands dogfood setup --repo /path/to/project --db data/projects/my_project.duckdb

The hook reads the active project from config.yaml and resolves the correct database path automatically.

Troubleshooting

Hook produces empty {} output: - Check that the database file exists at the configured db_path - Verify the active project in config.yaml has a valid db_path - Run python -m src.cli.import_commands dogfood status to check freshness - Ensure the commit changed code files (.py, .go, .c, etc.), not just docs or configs

CPG always shows stale: - Ensure gocpg binary exists at gocpg/gocpg.exe (or the configured GOCPG_PATH) - Check that git hooks are installed: look for .git/hooks/post-commit - Try manual update: gocpg/gocpg.exe update --input=. --output=<db>

CC values are 0 after incremental update: - Incremental gocpg update may skip MethodMetricsPass for some entries. New entries can have cyclomatic_complexity=0. - The deduplication logic keeps the entry with the highest CC value, mitigating this. - If persistent, re-import the project from scratch: python -m src.cli import /path/to/source

DuckDB lock error (“file is being used by another process”): - Another gocpg.exe process is running (e.g., from gocpg watch or a concurrent hook invocation). - The hook uses read-only connections and handles lock errors gracefully, falling back to subprocess queries. - codegraph_watch update / ensure_fresh_with_details() return lock diagnostics (failure_kind=db_lock, locker_pids, auto_unlock_*, next_command) to speed up recovery. - If the locker PID is the current Python process, auto-unlock intentionally skips killing itself; run the suggested next_command after closing the locker.

Delta report not appearing: - The delta report only appears when the CPG was stale before the update (pre-update metrics were captured). - If the CPG is already fresh (e.g., gocpg watch updated it), there are no pre-update metrics to compare against.

Timeout exceeded: - The 58s budget (60s Claude Code limit minus 2s margin) accommodates most commits. For very large projects, gocpg update may exceed the 40s phase budget. - Reduce max_files_per_commit in config. - Ensure GoCPG indexes are up to date: gocpg/gocpg.exe index --db=<db>

Hook metrics not appearing: - Check that data/hook_metrics.jsonl exists and is writable. - Run python -m src.cli.import_commands dogfood hooks-status to see aggregated stats.