Scenario 09: Code Review

Automated code review for pull requests, patches, and local changes using CPG-powered analysis with handler-based fast path and LLM fallback.

Table of Contents

Quick Start

# Select Code Review Scenario
/select 09

Or via CLI:

# Review changes against a base ref
python -m src.cli review --base-ref HEAD~3

# Review staged changes
python -m src.cli review --staged

# Review with SARIF output
python -m src.cli review --base-ref main --format sarif --sarif-file results.sarif

How It Works

Two-Phase Architecture

S09 uses the standard two-phase architecture: Phase 1 handler-based (no LLM) and Phase 2 LLM fallback:

Query -> CodeReviewIntentDetector.detect()
  |
  Phase 1: integrate_handlers(state)
    -> HandlerRegistry("code_review") -> match handler by intent
    -> handler.handle() -> ReviewReportFormatter -> response
    |
    handled=True  -> return formatted result (no LLM)
    handled=False -> Phase 2
  |
  Phase 2: code_review_workflow() [LLM fallback]
    -> 3 agents (PRAnalyzer, ContextAggregator, ReviewReporter)
    -> CallGraphAnalyzer (blast radius)
    -> PromptRegistry -> LLMInterface.generate() -> response

code_review_workflow() in src/workflow/scenarios/code_review.py first calls integrate_handlers(state). If a handler matches (handled=True), the result is returned without LLM. Otherwise the full LLM workflow executes with 3 agents and graph analysis.

Additionally, CodeGraph provides a standalone ReviewPipeline (src/review/pipeline.py) for CLI-based review via python -m src.cli review, independent of the LangGraph orchestrator.

Intent Detection

CodeReviewIntentDetector(IntentDetector) in code_review_handlers/intent_detector.py defines 4 intents sorted by priority:

Intent Priority Keywords (EN + RU)
pr_impact 10 PR, pull request, merge request, запрос на слияние
change_risk 20 risk, danger, regression, risk assessment, риск, регрессия
review_priority 30 priority, critical, urgent, приоритет, критичный
dependency_impact 40 dependency, import, coupling, зависимость, связанность

Fallback: general_review (confidence=0.5) when no pattern matches.

Keyword matching uses keyword_match_morphological() for Russian morphology support.

Handler Phase

4 Handlers

code_review_handlers/workflow.py registers 4 handlers in HandlerRegistry("code_review"):

Handler Priority Intent Description
CallerAnalysisHandler 3 caller/callee queries 2-hop transitive caller analysis, finds callers-of-callers for full blast radius
SignatureImpactHandler 5 signature change Analyzes impact of method signature changes on callers and dependents
PRImpactHandler 10 pr_impact Git diff analysis via git diff --name-only, maps changed files to CPG methods
ChangeRiskHandler 20 change_risk Computes risk score (0.0–1.0) per method based on caller count, complexity, module location

All handlers inherit CodeReviewHandler(BaseHandler).

CallerAnalysisHandler detects caller/callee queries via bilingual keywords (EN: caller, callee, who calls, called by; RU: вызывающие, кто вызывает). Performs 2-hop transitive search to assess full blast radius.

PRImpactHandler extracts changed files via git diff --name-only {base_ref} HEAD. Supported extensions: .py, .go, .ts, .js, .c, .h, .java, .kt, .cs, .php.

ReviewReportFormatter

ReviewReportFormatter in code_review_handlers/formatters/ formats handler results with: - Change risk badges (critical/high/medium/low) - Interface impact indicators - Caller chain visualization - Localization support (EN/RU)

LLM Phase

3 Agents

When no handler matches, code_review_workflow() executes the full LLM pipeline with 3 agents from src/code_review/review_agents.py:

Agent Role Key Methods
PRAnalyzer PR analysis, file diffing, dependency detection analyze_pr(files), detect_dependencies(changes)
ContextAggregator CPG context collection, method metadata aggregate_context(methods), get_method_details(name)
ReviewReporter Review report generation, finding prioritization generate_report(findings), prioritize_findings(items)

Pipeline: 1. PRAnalyzer analyzes changed files and detects dependencies 2. ContextAggregator collects CPG context for changed methods 3. ReviewReporter generates the final review report 4. Evidence list built from findings and risk assessments 5. PromptRegistry.get_agent_prompt("code_review", ...) builds the prompt 6. LLMInterface().generate() produces the final response

CallGraphAnalyzer & Graph Insights

After the 3 agents complete, CallGraphAnalyzer(cpg) from src/analysis performs graph-based blast radius analysis:

  • Change impact: For each changed method, calls find_all_callers() and analyze_impact() to determine how many other methods are affected
  • Affected methods: Builds a transitive closure of callers up to 2 hops — methods that could break if the change is incorrect
  • Risk assessment: Combines caller count, callee count, and interface layer presence into a risk score

Graph insights are stored in state["metadata"]["graph_insights"]:

Category Description
change_impact Methods with highest impact score, sorted by blast radius
affected_methods Full list of transitively affected methods with hop distance
risk_assessment Per-method risk (critical/high/medium/low) with contributing factors

ReviewPipeline (Standalone)

Pipeline Architecture

ReviewPipeline in src/review/pipeline.py is a standalone pipeline (separate from the LangGraph scenario) that orchestrates the full CLI review cycle:

detect_project -> check_cpg_status -> load_scope -> get_changed_files
     -> quality_analysis -> security_analysis -> aggregate -> format_output

Key components: - ReviewAggregator (src/review/aggregator.py): combines quality and security findings, applies scope-aware filtering - ReviewReport (src/review/models.py): generates markdown, JSON, and SARIF 2.1.0 output - SecurityPRReview: optional security analysis for changed files (disabled with --no-security)

Scope-Aware Filtering

When the CPG is built with excluded directories (partial scope), ReviewAggregator applies scope-aware filtering:

  • Dead code demotion: dead code findings demoted to info severity, marked scope_limited (callers may exist in excluded modules)
  • Blast radius demotion: blast radius findings demoted to info severity, marked scope_limited
  • Test suppression: test-related findings suppressed entirely when include_tests=false
  • Complexity/Security: not affected (per-method metrics and real vulnerability detection)

A scope disclaimer (ParseScope.scope_disclaimer) is added to reports when filtering is active.

SARIF Output

ReviewPipeline supports SARIF 2.1.0 output for integration with GitHub/GitLab security tabs and IDE plugins:

python -m src.cli review --base-ref main --format sarif --sarif-file results.sarif

Exit codes: 0 = clean or medium/low only, 1 = critical or high findings detected.

Data Models

Key models from src/review/models.py:

Model Key Fields
ParseScope is_partial, tests_in_scope, scope_disclaimer, excluded_dirs
ReviewFinding title, severity (ReviewSeverity), file_path, line, description, category, scope_limited
ReviewReport findings, summary, scope, format_output (markdown/json/sarif)
ReviewSeverity Enum: critical, high, medium, low, info
CPGStatus freshness, parse_scope, language, method_count

CLI Usage

Full CLI reference for python -m src.cli review:

# Review changes against a base ref
python -m src.cli review --base-ref HEAD~3

# Review staged changes
python -m src.cli review --staged

# Review specific files
python -m src.cli review --files src/api/main.py src/auth.py

# Skip security analysis
python -m src.cli review --no-security

# Output as JSON
python -m src.cli review --base-ref main --format json

# Output as SARIF
python -m src.cli review --base-ref main --format sarif --sarif-file results.sarif

# Write output to file
python -m src.cli review --base-ref HEAD~5 --output-file review.md
Flag Description
--base-ref Git ref to diff against (e.g., HEAD~3, main, commit SHA)
--staged Review only staged (git add) changes
--files Review specific files (space-separated paths)
--no-security Skip security analysis pass
--format Output format: markdown (default), json, sarif
--sarif-file Write SARIF output to specified file
--output-file Write formatted output to file

Change Risk Assessment

Each changed method receives a risk score (0.0–1.0) based on 4 factors:

Factor Description
Caller count More callers means wider impact
Signature complexity Parameter count
Core module location Domain plugins and base services
Interface layer +0.15 if method is in CLI/API/MCP/ACP

Levels: critical (≥0.8), high (≥0.6), medium (≥0.4), low (<0.4).

Interface Impact Detection

When files in interface layers are changed, the code review identifies which interfaces are affected:

Interface Layers:
  CLI        -> src/cli/
  REST API   -> src/api/routers/
  MCP        -> src/mcp/tools/, src/mcp/
  ACP        -> src/acp/server/, src/acp/

Example Questions

What is the blast radius of this change?
Review my last 3 commits
Assess the risk of changes in src/api/
Who calls this method and what would break?
Show the impact of changing function signature
Prioritize review findings by severity
What dependencies are affected by this PR?