Dogfooding Guide: CPG-Powered Commit Analysis

Dogfooding Guide: CPG-Powered Commit Analysis

CodeGraph analyzes its own codebase through the Code Property Graph after every commit, creating a Plan-Act-Review feedback loop. Claude Code receives quality metrics, blast radius data, and before/after impact comparison as context immediately after committing code.

Table of Contents

How It Works

The dogfooding pipeline connects three systems:

  1. GoCPG builds and maintains a Code Property Graph (DuckDB) with pre-computed metrics for every method: cyclomatic complexity, fan-in/fan-out, TODO/FIXME flags, debug code, deprecated usage.

  2. Git hooks trigger CPG updates after each commit, keeping the database synchronized with the codebase.

  3. Claude Code hooks fire after git commit commands, query the CPG for changed methods, compute quality metrics and blast radius, then inject the report back into the conversation as additional context.

The result: every commit produces an instant quality assessment without leaving the IDE or running separate tools.

Usage Scenarios

Scenario 1: Automatic post-commit feedback

The primary scenario. You work in Claude Code, make changes, commit:

You: "Commit these changes"
Claude: git add src/intent/classifier.py && git commit -m "refactor: extract pattern table"

The PostToolUse hook fires automatically and injects a report:

## Commit Analysis Report
**Summary:** 1 files, 45 methods, 2 high-CC, 3 TODO/FIXME, 128 affected callers
**CPG status:** fresh

**Impact of changes:**
- `_get_fallback_domain`: CC 29->8 (-21), FanOut 18->5 (-13)

**High complexity methods:**
- `classify` (CC: 17)
- `_score_domain` (CC: 16)

**Blast radius:** 128 callers affected
- `classify` called by: `run_intent_classifier`, `IntentBenchmark._evaluate_single` +126 more

Claude sees this context and can react: “The refactoring reduced _get_fallback_domain complexity from 29 to 8. Two methods still have CC>10: classify and _score_domain.”

What triggers the hook: Any git commit command executed via the Bash tool. The hook detects "git commit" in the command string. Non-commit Bash commands (e.g., git status, ls) are ignored.

What does NOT trigger it: Direct terminal commits outside Claude Code, git commit --amend, or commits via other tools.

Scenario 2: Find and fix quality issues

Use CPG queries to find code quality targets, then fix them with the pipeline providing feedback:

You: "Query the CPG for methods with CC > 15 and TODO/FIXME flags in src/workflow/"
Claude: [runs DuckDB query]
  Found: _get_fallback_domain (CC=29, TODO), PolicyViolationsHandler.handle (CC=68, TODO)

You: "Refactor _get_fallback_domain to reduce complexity"
Claude: [extracts patterns to data table, replaces if/else chain with loop]

You: "Commit"
Claude: git commit -m "refactor: extract fallback patterns to class-level table"
  -> Hook fires, report shows: CC 29->8 (-21)

This is the full Plan-Act-Review loop: 1. Plan: CPG query identifies the problem 2. Act: Refactoring reduces complexity 3. Review: Hook confirms the improvement with concrete metrics

Scenario 3: Validate refactoring impact

Before making a large refactoring, check the blast radius:

You: "What's the blast radius if I change HierarchicalIntentClassifier.classify?"
Claude: [queries call_containment]
  213 direct callers across production code and tests

You: "Proceed with the refactoring"
Claude: [makes changes, commits]
  -> Hook shows: 213 affected callers, CC unchanged, no regressions

The blast radius report helps gauge the risk of changes before they happen.

Scenario 4: On-demand analysis

Run analysis without committing:

# Analyze the last commit
python -m src.cli.import_commands dogfood analyze --base-ref HEAD~1

# Analyze changes between branches
python -m src.cli.import_commands dogfood analyze --base-ref origin/main

# Generate a full quality report
python -m src.cli.import_commands dogfood report --format markdown

Pipeline Architecture

Data flow

git commit (via Bash tool in Claude Code)
    |
    v
PostToolUse hook fires (.claude/hooks/commit_analysis.py, 60s timeout)
    |
    +-- Phase 1: CPG freshness check
    |       Query cpg_git_state.commit_hash, compare to git rev-parse HEAD
    |
    +-- Phase 1.5: Capture pre-update metrics (for delta report)
    |       Query nodes_method for changed files BEFORE CPG update
    |       Store {full_name: {cc, fan_out, ...}} for later comparison
    |
    +-- Phase 2: CPG update (--force for accurate metrics)
    |       gocpg update --force --input=<source> --output=<db>
    |       Full re-parse ensures MethodMetricsPass computes CC/fan_in/fan_out
    |
    +-- Phase 3: Quality + blast radius analysis
    |       Query nodes_method for changed files (post-update)
    |       Deduplicate methods (normalize slash variants, keep highest CC)
    |       Compute quality summary: high-CC, high-fan_out, TODO, debug, deprecated
    |       Query call_containment for callers of changed methods
    |
    +-- Phase 4: Delta computation
    |       Compare pre-update vs post-update metrics
    |       Generate before->after report: "CC 29->8 (-21)"
    |
    +-- Output: {"additionalContext": "## Commit Analysis Report\n..."}
            Injected back into Claude Code conversation

Timeout budget

The hook has a 60-second total timeout with internal phases:

Phase Budget Action
Freshness check 2s Compare cpg_git_state.commit_hash to git rev-parse HEAD
Pre-update metrics ~1s Query current metrics for changed files (for delta report)
CPG update (–force) 40s Run gocpg update --force for full re-parse with metrics
Quality analysis 8s Query nodes_method for CC, TODO, debug, deprecated flags
Blast radius 8s Query call_containment for direct callers

If any phase exceeds its budget, the hook degrades gracefully: it produces whatever data it has or returns empty {}.

CPG freshness and force re-parse

The hook uses --force flag when running gocpg update. This triggers a full re-parse instead of incremental update. The reason:

  • Incremental update (gocpg update without --force): Creates new method entries but skips MethodMetricsPass. New entries have cyclomatic_complexity=0, fan_in=0, fan_out=0. Fast but produces incomplete metrics.

  • Force re-parse (gocpg update --force): Runs the full parse pipeline including MethodMetricsPass. All metrics are computed correctly. Slower but accurate.

For the dogfooding use case, accuracy is more important than speed. The 40-second budget accommodates force re-parse for projects up to several hundred source files.

Method deduplication

GoCPG may store the same method with different filename formats (forward slash src/file.py vs backslash src\file.py). The analyzer deduplicates by normalizing full_name slashes and keeping the entry with the highest CC value:

# Before dedup: 2 entries for the same method
src\intent\classifier.py:Classifier.classify  CC=17
src/intent/classifier.py:Classifier.classify   CC=0  (from incremental update)

# After dedup: 1 entry, highest CC wins
src\intent\classifier.py:Classifier.classify  CC=17

Delta report

When the CPG is stale (needs update), the hook captures pre-update metrics before running gocpg update --force, then compares against post-update metrics. This produces a delta showing the actual impact of changes:

**Impact of changes:**
- `_get_fallback_domain`: CC 29->8 (-21), FanOut 18->5 (-13)
- `_build_quality_summary`: CC 5->3 (-2)

Methods with no metric changes are omitted. This helps developers immediately see whether their refactoring improved or degraded code quality.

Setup

One-command setup

python -m src.cli.import_commands dogfood setup --repo . --db data/projects/codegraph.duckdb

This installs git hooks (via gocpg) and verifies the Claude Code hook configuration.

Manual setup

  1. Install git hooks (background CPG update on commit): bash gocpg/gocpg.exe hooks install --repo=. --db=data/projects/codegraph.duckdb

  2. Configure Claude Code hooks in .claude/settings.json: json { "hooks": { "PostToolUse": [{ "matcher": "Bash", "hooks": [{ "type": "command", "command": "python .claude/hooks/commit_analysis.py", "timeout": 60000 }] }] } }

Note: The matcher field must be a string (regex pattern), not an object. "Bash" matches the Bash tool specifically.

Verify setup

python -m src.cli.import_commands dogfood status

Expected output shows CPG freshness, hook status, and database path.

CLI Commands

# Full setup (git hooks + Claude Code hooks)
python -m src.cli.import_commands dogfood setup [--repo PATH] [--db PATH] [--language LANG]

# Check CPG freshness and hook status
python -m src.cli.import_commands dogfood status [--db PATH]

# Run commit analysis on demand
python -m src.cli.import_commands dogfood analyze [--base-ref HEAD~1] [--db PATH]

# Generate quality report (markdown or JSON)
python -m src.cli.import_commands dogfood report [--format markdown|json] [--db PATH]

Configuration

In config.yaml:

dogfooding:
  enabled: true
  auto_update_cpg: true          # Run gocpg update if CPG is stale
  cpg_update_timeout: 40         # Seconds for CPG update
  analysis_timeout: 16           # Seconds for quality + blast radius
  cc_threshold: 10               # Flag methods with CC above this
  fan_out_threshold: 30          # Flag methods with fan_out above this
  blast_radius_depth: 2          # Max depth for caller traversal
  max_files_per_commit: 15       # Max files to analyze per commit
  report_format: markdown        # markdown or json

Report Format

The hook returns a markdown report as additionalContext in JSON:

{"additionalContext": "## Commit Analysis Report\n**Summary:** ..."}

Full report structure:

## Commit Analysis Report
**Summary:** 3 files, 45 methods, 2 high-CC, 1 TODO/FIXME, 128 affected callers
**CPG status:** fresh

**Impact of changes:**
- `_get_fallback_domain`: CC 29->8 (-21), FanOut 18->5 (-13)

**High complexity methods:**
- `classify` (CC: 17)
- `_score_domain` (CC: 16)

**High fan-out methods:**
- `classify` (fan_out: 39)

**Blast radius:** 128 callers affected
- `classify` called by: `run_intent_classifier`, `IntentBenchmark._evaluate_single` +126 more
- `_classify_domain` called by: `classify`, `get_morph` +4 more

*Analysis completed in 95ms*

Sections are omitted when empty (e.g., no high-CC methods = no “High complexity” section).

Scaling to Other Projects

The dogfooding pipeline is project-agnostic. To set up for any project:

  1. Import the project to create a CPG database: bash python -m src.cli import /path/to/project --language python

  2. Register the project in config.yaml: yaml projects: active: my_project registry: my_project: db_path: data/projects/my_project.duckdb source_path: /path/to/project language: python domain: python_generic

  3. Run setup: bash python -m src.cli.import_commands dogfood setup --repo /path/to/project --db data/projects/my_project.duckdb

The hook reads the active project from config.yaml and resolves the correct database path automatically.

Troubleshooting

Hook produces empty {} output: - Check that the database file exists at the configured db_path - Verify the active project in config.yaml has a valid db_path - Run python -m src.cli.import_commands dogfood status to check freshness - Ensure the commit changed code files (.py, .go, .c, etc.), not just docs or configs

CPG always shows stale: - Ensure gocpg binary exists at gocpg/gocpg.exe (or the configured GOCPG_PATH) - Check that git hooks are installed: look for .git/hooks/post-commit - Try manual update: gocpg/gocpg.exe update --force --input=. --output=<db>

CC values are 0 after incremental update: - This happens when gocpg update runs without --force. The incremental update skips MethodMetricsPass. - The hook uses --force by default. If you see CC=0, the force re-parse may have timed out. Check the 40s timeout budget.

DuckDB lock error (“file is being used by another process”): - Another gocpg.exe process is running (e.g., from gocpg watch or a concurrent hook invocation). - The hook uses read-only connections and handles lock errors gracefully, falling back to subprocess queries.

Delta report not appearing: - The delta report only appears when the CPG was stale before the update (pre-update metrics were captured). - If the CPG is already fresh (e.g., gocpg watch updated it), there are no pre-update metrics to compare against.

Timeout exceeded: - The 60s budget accommodates most commits. For very large projects, gocpg update --force may exceed the 40s phase budget. - Reduce max_files_per_commit in config or consider using incremental update (remove force=True in commit_analysis.py). - Ensure GoCPG indexes are up to date: gocpg/gocpg.exe index --db=<db>