Workflow Scenarios Guide

CodeGraph supports 21 specialized analysis scenarios routed through a pre-retrieval pipeline.

How It Works

Queries flow through a LangGraph pipeline: Intent Classification → Pre-Retrieval → Routing → Scenario Handler → Response. The pre-retrieval phase (Phase E) enriches the query with CPG data before routing to the appropriate scenario handler. You can force a specific scenario via context={"scenario_id": "scenario_N"}.

Table of Contents

Scenario Overview

# Scenario Intent Use Case
1 Codebase Onboarding onboarding Navigate the codebase for new developers
2 Security Audit security Comprehensive audit with taint analysis
3 Documentation Generation documentation Auto-generate technical documentation
4 Feature Development feature_dev Guidance for implementing new features
5 Refactoring refactoring Refactoring recommendations with impact analysis
6 Performance Analysis performance Identify performance bottlenecks
7 Test Coverage test_coverage Test coverage analysis and recommendations
8 Compliance Verification compliance OWASP Top 10, CWE, domain-specific standards
9 Code Review code_review Automated PR/MR review
10 Cross-Repository Analysis cross_repo Cross-module dependency analysis
11 Architecture Analysis architecture Detect architectural constraint violations
12 Tech Debt Assessment tech_debt Quantify technical debt
13 Mass Refactoring mass_refactoring Automated mass refactoring (API migrations)
14 Security Incident Response security_incident Incident investigation with recommendations
15 Debugging Support debugging Data-flow-based debugging assistance
16 Entry Points & Attack Surface entry_points API entry points and attack surface analysis
17 File Editing file_editing AST-based precise code editing
18 Code Optimization code_optimization Composite: S02, S05, S06, S11, S12 parallel (60s)
19 Standards Check standards_check Composite: S08, S17, S18 sequential (45s)
20 Dependency Analysis dependencies Dependency and import analysis
21 Interface Docs Sync interface_docs_sync Documentation coverage across all interfaces

1. Codebase Onboarding

Navigate the codebase for new developers: find function/class/struct definitions, explore architecture.

Example Questions

Explain the project architecture
Find method 'heap_insert'
Where is AbortTransaction defined?
Show the definition of RelFileNode struct
Find all methods in file 'xact.c'

Usage

python -m src.cli query "Where is heap_insert defined?"
from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()
result = copilot.run("Explain the project architecture")
print(result['answer'])

2. Security Audit

Comprehensive security audit: vulnerability detection, taint analysis, call graph analysis, CWE mapping.

Example Questions

Find SQL injection vulnerabilities
Show potential buffer overflows
Find unsanitized user input
Trace user input to query execution
What functions call LWLockAcquire?

Security Patterns Detected

  • SQL Injection (CWE-89)
  • Buffer Overflow (CWE-120)
  • Command Injection (CWE-78)
  • Format String (CWE-134)
  • Integer Overflow (CWE-190)

Usage

python -m src.cli query "Find SQL injection vulnerabilities"
from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()
result = copilot.run("Find SQL injection vulnerabilities")

print(result['answer'])
# Detailed findings are in result['evidence'] and result['metadata']

3. Documentation Generation

Auto-generate technical documentation from source code.

Example Questions

Generate API documentation
Document the transaction subsystem
Create a summary of the buffer manager

Usage

python -m src.cli query "Document the transaction subsystem"
copilot = MultiScenarioCopilot()
result = copilot.run("Document the transaction subsystem")
print(result['answer'])

4. Feature Development

Guidance for implementing new features: placement recommendations, pattern examples, dependency navigation.

Example Questions

Where should I add a new endpoint?
Where should I place new cache invalidation feature?
Find similar features to buffer management
Show pattern examples for executor subsystem

Features

  • Optimal placement: Recommends the best file and nearby method for new code
  • Pattern examples: Classifies existing methods (Initialization, Handler, Query, Validation, Cleanup patterns)
  • Confidence scoring: Indicates how well the feature description matches the recommended subsystem

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Where should I place new cache invalidation feature?")
print(result['answer'])

See Feature Development Scenario for detailed examples.


5. Refactoring

Refactoring recommendations with impact analysis: dead code, duplication, extract-method opportunities.

Example Questions

Find unreachable functions
Find duplicate code blocks
Plan refactoring of the buffer manager
Find functions to split

Detected Patterns

  • Functions with no callers (dead code)
  • Unreachable code blocks
  • Copy-paste duplication
  • Long functions for method extraction

Usage

python -m src.cli query "Find unused functions"
copilot = MultiScenarioCopilot()
result = copilot.run("Find unused functions")
print(result['answer'])
# Dead code details in result['metadata']

6. Performance Analysis

Identify performance bottlenecks: cyclomatic complexity, expensive loops, concurrency, memory allocation patterns.

Example Questions

Find N+1 database queries
Find functions with high cyclomatic complexity
Find O(n^2) patterns
Find race conditions
Show lock ordering issues

Metrics Analyzed

  • Cyclomatic complexity
  • Loop nesting depth
  • Function length and call frequency
  • Memory allocation patterns
  • Thread safety (locks, race conditions)

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Find functions with complexity > 20")
print(result['answer'])
# Complexity metrics in result['metadata']

7. Test Coverage

Test coverage analysis and recommendations. Supports importing runtime coverage data from external tools.

Example Questions

Which functions are not covered by tests?
Find untested code
Show functions without tests

Importing Coverage Data

# Import pytest-cov JSON report
python -m src.cli coverage import --file coverage.json --format pytest-cov --db data/projects/postgres.duckdb

# Import lcov trace file
python -m src.cli coverage import --file coverage.lcov --format lcov

# Import Cobertura XML (Java/C#)
python -m src.cli coverage import --file coverage.xml --format cobertura --source-root /project

After importing, the “Find untested code” query automatically switches to hybrid mode, combining runtime coverage_percent values with heuristic test-caller analysis.

CPG-Based Test Recommendations

  • Branch coverage: Counts IF/FOR/SWITCH control structures and estimates required test cases
  • Parameter boundaries: Maps parameter types to boundary value suggestions (zero, null, empty, max)
  • Error paths: Counts TRY blocks and multiple RETURN statements indicating error handling

8. Compliance Verification

Verify code compliance with standards: OWASP Top 10, CWE, and domain-specific standards. The compliance handler adapts to the active domain plugin — different domains provide different compliance rules.

Example Questions

Check compliance with OWASP Top 10
Verify data handling requirements
Check code standards compliance

Usage

python -m src.cli query "Check compliance with OWASP Top 10"
copilot = MultiScenarioCopilot()
result = copilot.run("Check compliance with OWASP Top 10")
print(result['answer'])

9. Code Review

Automated code review for PR/MR.

Example Questions

Review this PR for issues
Find potential bugs in this change
Check for style violations
Analyze test coverage for changes

Usage

# CLI review
python -m src.cli review --base-ref HEAD~5

# Demo script (interactive)
python examples/demo_patch_review.py --db data/projects/postgres.duckdb --interactive
copilot = MultiScenarioCopilot()
result = copilot.run("Review changes in the executor module")
print(result['answer'])

10. Cross-Repository Analysis

Cross-module dependency analysis, duplication between repositories.

Example Questions

Find duplicate code between repo A and B
Show cross-module dependencies

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Find duplicate code between repo A and B")
print(result['answer'])

11. Architecture Analysis

Detect architectural constraint violations, analyze subsystems, layers, and dependencies.

Example Questions

Find circular dependencies
Map the subsystem architecture
Show layer boundaries
Find architectural violations
Show subsystem diagram

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Map the PostgreSQL architecture")
print(result['answer'])
# Subsystem data in result['metadata'] and result['subsystems']

12. Tech Debt Assessment

Quantify technical debt.

Example Questions

Assess technical debt of this module
Find code with excessive coupling
Show modules needing refactoring
Identify maintenance hotspots

Debt Indicators

  • High complexity
  • Deep nesting
  • Long functions
  • High coupling
  • Missing error handling

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Find technical debt hotspots")
print(result['answer'])
# Debt details in result['metadata']

13. Mass Refactoring

Automated mass refactoring: API migrations, renames.

Example Questions

Plan migration from v1 to v2 API
Rename function X to Y across all files

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Plan migration from v1 to v2 API")
print(result['answer'])

14. Security Incident Response

Investigate security incidents: call-path tracing from entry points to vulnerabilities, CVE impact analysis, blast radius calculation, Mermaid attack path diagrams, taint flow analysis.

Example Questions

Trace the impact of CVE-XXXX
Find all code paths affected by this vulnerability
Show exploitation paths
Trace attack paths to vulnerable function
Find entry points that reach parse_query
Identify affected functions

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Trace impact of vulnerability in parse_query")
print(result['answer'])

# Attack paths available in metadata
for path in result['metadata'].get('attack_paths', []):
    print(f"{path.entry_point} -> {path.vulnerability} (chain: {path.chain_length})")

15. Debugging Support

Data-flow-based debugging assistance.

Example Questions

Find all elog(ERROR) locations
Show potential deadlocks
Find missing lock acquisitions

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Find all elog(ERROR) locations")
print(result['answer'])

16. Entry Points & Attack Surface

API entry points and attack surface analysis: exported functions, hook functions.

Example Questions

Which functions accept user input?
Find all exported functions
Show main API entry points
Find hook functions

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Find API entry points")
print(result['answer'])
# Entry point details in result['metadata']

17. File Editing

AST-based precise code editing.

Example Questions

Rename function X to Y across all files

18. Code Optimization (Composite)

Comprehensive optimization: composite scenario that runs sub-scenarios S02, S05, S06, S11, S12 in parallel (60s timeout).

Example Questions

Optimize the authorization module

19. Standards Check (Composite)

Code standards verification: composite scenario that runs S08, S17, S18 sequentially (45s timeout).

Example Questions

Check code against project standards

20. Dependency Analysis

Dependency and import analysis: module dependency graph.

Example Questions

Show the module dependency tree
What modules depend on storage?
Find circular dependencies

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Show dependencies of transaction module")
print(result['answer'])
# Dependency data in result['metadata']

21. Interface Documentation Sync

Check documentation coverage across all CodeGraph interfaces. Discovers code entities (endpoints, commands, tools, methods) and compares them against existing markdown documentation to find undocumented entities, stale docs, and signature mismatches.

Supported Interfaces

REST API, CLI, MCP, ACP, gRPC, WebSocket — 6 interfaces with configurable paths and doc file mappings.

Example Questions

Check documentation coverage
Which endpoints are undocumented?
Find stale documentation entries
Show docs sync report for REST API and CLI

Drift Categories

  • UNDOCUMENTED: Code entity exists but has no documentation
  • STALE: Documented entity no longer exists in code
  • OUTDATED: Both exist but parameters/signatures differ
  • COVERED: Properly documented

Usage

CLI

# Full report (markdown)
python -m src.cli docs-sync --db data/projects/codegraph.duckdb

# CI mode (exit 1 if coverage below threshold)
python -m src.cli docs-sync --check --format json

# Filter interfaces
python -m src.cli docs-sync --interfaces rest_api,cli --language ru

REST API

POST /api/v1/documentation/sync
{
  "interfaces": ["rest_api", "cli"],
  "language": "en",
  "output_format": "markdown"
}

MCP

codegraph_docs_sync(interfaces="rest_api,cli", language="en", output_format="json")

Programmatic

from src.workflow.scenarios.interface_docs_sync_composite import InterfaceDocsSyncRunner

runner = InterfaceDocsSyncRunner(db_path="data/projects/codegraph.duckdb")
result = runner.run()
print(result.markdown)
print(f"Coverage: {result.coverage_ratio:.1%}")

Composite Workflows

Beyond the numbered scenarios, CodeGraph provides composite workflows that orchestrate multiple scenarios together.

Audit Composite

Runs 9 sub-scenarios in parallel (600s timeout) across 12 quality dimensions. Produces a comprehensive audit report with findings deduplication and FP reduction.

python -m src.cli audit --db data/projects/postgres.duckdb --language en --format json

Story Validation

Validates user stories against 4 interfaces (REST, CLI, MCP, gRPC). Checks that every “Done” story has corresponding implementation in code.

python -m src.cli.import_commands dogfood validate-stories

Structural Pattern Search (CLI Feature)

Find code matching structural patterns with CPG-aware constraints (data flow, call graph, types, domain annotations). This is a CLI/API feature, not a numbered workflow scenario.

Example Questions

Find unchecked return values
Find malloc without free
Show functions matching error-handling anti-patterns
Find SQL query construction without parameterization
Find all functions with cyclomatic complexity > 20

Pattern Types

  • Syntactic: Tree-sitter CST patterns with metavariables ($VAR, $$ARGS, $_)
  • CPG-constrained: Patterns with data flow, call graph, type, and domain constraints
  • YAML rules: Pre-defined rules in configs/rules/ (190 rules across 14 languages)

Usage

CLI

# Ad-hoc pattern search
python -m src.cli patterns search "malloc($x)" --lang c

# Scan with all rules
python -m src.cli patterns scan

# Scan specific rule
python -m src.cli patterns scan --rule unchecked-return

# Generate rule from description
python -m src.cli patterns generate "find unchecked return values" --lang c

# Autofix (dry run)
python -m src.cli patterns fix --dry-run

Combining Scenarios

Run multiple scenarios together:

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()

# Force a specific scenario via context
result = copilot.run(
    "Analyze the executor module",
    context={"scenario_id": "scenario_2"}  # security
)
print(f"Answer: {result['answer']}")
print(f"Evidence: {result['evidence']}")

# Or run the composite audit across all dimensions
# python -m src.cli audit --db PATH

Next Steps