Workflow Scenarios Guide¶

CodeGraph supports 21 specialized analysis scenarios.

Table of Contents¶

Scenario Overview
1. Codebase Onboarding
2. Security Audit
3. Documentation Generation
4. Feature Development
5. Refactoring
6. Performance Analysis
7. Test Coverage
8. Compliance Verification
9. Code Review
10. Cross-Repository Analysis
11. Architecture Analysis
12. Tech Debt Assessment
13. Mass Refactoring
14. Security Incident Response
15. Debugging Support
16. Entry Points & Attack Surface
17. File Editing
18. Code Optimization
19. Standards Check
20. Dependency Analysis
21. Structural Pattern Search
Combining Scenarios
Next Steps

Scenario Overview¶

#	Scenario	Use Case
1	Codebase Onboarding	Navigate the codebase for new developers
2	Security Audit	Comprehensive audit with taint analysis
3	Documentation Generation	Auto-generate technical documentation
4	Feature Development	Guidance for implementing new features
5	Refactoring	Refactoring recommendations with impact analysis
6	Performance Analysis	Identify performance bottlenecks
7	Test Coverage	Test coverage analysis and recommendations
8	Compliance Verification	OWASP, GDPR, ISO 27001 compliance checks
9	Code Review	Automated PR/MR review
10	Cross-Repository Analysis	Cross-module dependency analysis
11	Architecture Analysis	Detect architectural constraint violations
12	Tech Debt Assessment	Quantify technical debt
13	Mass Refactoring	Automated mass refactoring (API migrations)
14	Security Incident Response	Incident investigation with recommendations
15	Debugging Support	Data-flow-based debugging assistance
16	Entry Points & Attack Surface	API entry points and attack surface analysis
17	File Editing	AST-based precise code editing
18	Code Optimization	Comprehensive optimization (security, refactoring, architecture)
19	Standards Check	Code standards verification
20	Dependency Analysis	Dependency and import analysis
21	Structural Pattern Search	Find code patterns with CPG constraints

1. Codebase Onboarding¶

Navigate the codebase for new developers: find function/class/struct definitions, explore architecture.

Example Questions¶

Explain the project architecture
Find method 'heap_insert'
Where is AbortTransaction defined?
Show the definition of RelFileNode struct
Find all methods in file 'xact.c'

Usage¶

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()
result = copilot.run("Explain the project architecture", scenario="onboarding")

print(result['answer'])

2. Security Audit¶

Comprehensive security audit: vulnerability detection, taint analysis, call graph analysis, CWE mapping.

Example Questions¶

Find SQL injection vulnerabilities
Show potential buffer overflows
Find unsanitized user input
Trace user input to query execution
What functions call LWLockAcquire?

Security Patterns Detected¶

SQL Injection (CWE-89)
Buffer Overflow (CWE-120)
Command Injection (CWE-78)
Format String (CWE-134)
Integer Overflow (CWE-190)

Usage¶

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()
result = copilot.run("Find SQL injection vulnerabilities")

for vuln in result['vulnerabilities']:
    print(f"CWE-{vuln['cwe']}: {vuln['description']}")
    print(f"  File: {vuln['file']}:{vuln['line']}")
    print(f"  Severity: {vuln['severity']}")

3. Documentation Generation¶

Auto-generate technical documentation from source code.

Example Questions¶

Generate API documentation
Document the transaction subsystem
Create a summary of the buffer manager

Usage¶

copilot = MultiScenarioCopilot()
result = copilot.run("Document the transaction subsystem")

print(result['documentation'])
# Markdown-formatted documentation

4. Feature Development¶

Guidance for implementing new features: placement recommendations, pattern examples, dependency navigation.

Example Questions¶

Where should I add a new endpoint?
Where should I place new cache invalidation feature?
Find similar features to buffer management
Show pattern examples for executor subsystem

Features¶

Optimal placement: Recommends the best file and nearby method for new code
Pattern examples: Classifies existing methods (Initialization, Handler, Query, Validation, Cleanup patterns)
Confidence scoring: Indicates how well the feature description matches the recommended subsystem

Usage¶

copilot = MultiScenarioCopilot()
result = copilot.run("Where should I place new cache invalidation feature?")
print(result['answer'])

See Feature Development Scenario for detailed examples.

5. Refactoring¶

Refactoring recommendations with impact analysis: dead code, duplication, extract-method opportunities.

Example Questions¶

Find unreachable functions
Find duplicate code blocks
Plan refactoring of the buffer manager
Find functions to split

Detected Patterns¶

Functions with no callers (dead code)
Unreachable code blocks
Copy-paste duplication
Long functions for method extraction

Usage¶

copilot = MultiScenarioCopilot()
result = copilot.run("Find unused functions")

for dead in result['dead_code']:
    print(f"Unused: {dead['name']} in {dead['file']}")

6. Performance Analysis¶

Identify performance bottlenecks: cyclomatic complexity, expensive loops, concurrency, memory allocation patterns.

Example Questions¶

Find N+1 database queries
Find functions with high cyclomatic complexity
Find O(n^2) patterns
Find race conditions
Show lock ordering issues

Metrics Analyzed¶

Cyclomatic complexity
Loop nesting depth
Function length and call frequency
Memory allocation patterns
Thread safety (locks, race conditions)

Usage¶

copilot = MultiScenarioCopilot()
result = copilot.run("Find functions with complexity > 20")

for func in result['complex_functions']:
    print(f"{func['name']}: complexity={func['complexity']}")

7. Test Coverage¶

Test coverage analysis and recommendations. Supports importing runtime coverage data from external tools.

Example Questions¶

Which functions are not covered by tests?
Find untested code
Show functions without tests

Importing Coverage Data¶

# Import pytest-cov JSON report
python -m src.cli coverage import --file coverage.json --format pytest-cov --db data/projects/postgres.duckdb

# Import lcov trace file
python -m src.cli coverage import --file coverage.lcov --format lcov

# Import Cobertura XML (Java/C#)
python -m src.cli coverage import --file coverage.xml --format cobertura --source-root /project

After importing, the “Find untested code” query automatically switches to hybrid mode, combining runtime coverage_percent values with heuristic test-caller analysis.

CPG-Based Test Recommendations¶

Branch coverage: Counts IF/FOR/SWITCH control structures and estimates required test cases
Parameter boundaries: Maps parameter types to boundary value suggestions (zero, null, empty, max)
Error paths: Counts TRY blocks and multiple RETURN statements indicating error handling

8. Compliance Verification¶

Verify code compliance with standards: OWASP Top 10, GDPR, ISO 27001.

Example Questions¶

Check compliance with OWASP Top 10
Verify GDPR data handling requirements
Check ISO 27001 compliance

Usage¶

copilot = MultiScenarioCopilot()
result = copilot.run("Check compliance with OWASP Top 10")
print(result['answer'])

9. Code Review¶

Automated code review for PR/MR.

Example Questions¶

Review this PR for issues
Find potential bugs in this change
Check for style violations
Analyze test coverage for changes

Usage¶

python demo_patch_review.py --patch changes.diff

Or programmatically:

copilot = MultiScenarioCopilot()
result = copilot.run("Review changes in path/to/changes.diff")

for finding in result['findings']:
    print(f"{finding['severity']}: {finding['description']}")

10. Cross-Repository Analysis¶

Cross-module dependency analysis, duplication between repositories.

Example Questions¶

Find duplicate code between repo A and B
Show cross-module dependencies

Usage¶

copilot = MultiScenarioCopilot()
result = copilot.run("Find duplicate code between repo A and B")
print(result['answer'])

11. Architecture Analysis¶

Detect architectural constraint violations, analyze subsystems, layers, and dependencies.

Example Questions¶

Find circular dependencies
Map the subsystem architecture
Show layer boundaries
Find architectural violations
Show subsystem diagram

Usage¶

copilot = MultiScenarioCopilot()
result = copilot.run("Map the PostgreSQL architecture")

for subsystem in result['subsystems']:
    print(f"{subsystem['name']}: {subsystem['description']}")

12. Tech Debt Assessment¶

Quantify technical debt.

Example Questions¶

Assess technical debt of this module
Find code with excessive coupling
Show modules needing refactoring
Identify maintenance hotspots

Debt Indicators¶

High complexity
Deep nesting
Long functions
High coupling
Missing error handling

Usage¶

copilot = MultiScenarioCopilot()
result = copilot.run("Find technical debt hotspots")

for debt in result['debt_items']:
    print(f"{debt['location']}: {debt['type']} (severity: {debt['severity']})")

13. Mass Refactoring¶

Automated mass refactoring: API migrations, renames.

Example Questions¶

Plan migration from v1 to v2 API
Rename function X to Y across all files

Usage¶

copilot = MultiScenarioCopilot()
result = copilot.run("Plan migration from v1 to v2 API")
print(result['answer'])

14. Security Incident Response¶

Investigate security incidents: call-path tracing from entry points to vulnerabilities, CVE impact analysis, blast radius calculation, Mermaid attack path diagrams, taint flow analysis.

Example Questions¶

Trace the impact of CVE-XXXX
Find all code paths affected by this vulnerability
Show exploitation paths
Trace attack paths to vulnerable function
Find entry points that reach parse_query
Identify affected functions

Usage¶

copilot = MultiScenarioCopilot()
result = copilot.run("Trace impact of vulnerability in parse_query")

# Attack paths from entry points to vulnerability
for path in result['metadata'].get('attack_paths', []):
    print(f"{path.entry_point} -> {path.vulnerability} (chain: {path.chain_length})")

15. Debugging Support¶

Data-flow-based debugging assistance.

Example Questions¶

Find all elog(ERROR) locations
Show potential deadlocks
Find missing lock acquisitions

Usage¶

copilot = MultiScenarioCopilot()
result = copilot.run("Find all elog(ERROR) locations")
print(result['answer'])

16. Entry Points & Attack Surface¶

API entry points and attack surface analysis: exported functions, hook functions.

Example Questions¶

Which functions accept user input?
Find all exported functions
Show main API entry points
Find hook functions

Usage¶

copilot = MultiScenarioCopilot()
result = copilot.run("Find API entry points")

for entry in result['entry_points']:
    print(f"Entry: {entry['name']} ({entry['type']})")

17. File Editing¶

AST-based precise code editing.

Example Questions¶

Rename function X to Y across all files

18. Code Optimization¶

Comprehensive optimization: composite scenario that runs sub-scenarios S02, S05, S06, S11, S12 in parallel (60s timeout).

Example Questions¶

Optimize the authorization module

19. Standards Check¶

Code standards verification: composite scenario that runs S08, S17, S18 sequentially (45s timeout).

Example Questions¶

Check code against project standards

20. Dependency Analysis¶

Dependency and import analysis: module dependency graph.

Example Questions¶

Show the module dependency tree
What modules depend on storage?
Find circular dependencies

Usage¶

copilot = MultiScenarioCopilot()
result = copilot.run("Show dependencies of transaction module")

for dep in result['dependencies']:
    print(f"{dep['from']} -> {dep['to']}")

21. Structural Pattern Search¶

Find code matching structural patterns with CPG-aware constraints (data flow, call graph, types, domain annotations).

Example Questions¶

Find unchecked return values
Find malloc without free
Show functions matching error-handling anti-patterns
Find SQL query construction without parameterization
Find all functions with cyclomatic complexity > 20

Pattern Types¶

Syntactic: Tree-sitter CST patterns with metavariables ($VAR, $$ARGS, $_)
CPG-constrained: Patterns with data flow, call graph, type, and domain constraints
YAML rules: Pre-defined rules in configs/rules/ (190 rules across 14 languages)

Usage¶

CLI¶

# Ad-hoc pattern search
python -m src.cli patterns search "malloc($x)" --lang c

# Scan with all rules
python -m src.cli patterns scan

# Scan specific rule
python -m src.cli patterns scan --rule unchecked-return

# Generate rule from description
python -m src.cli patterns generate "find unchecked return values" --lang c

# Autofix (dry run)
python -m src.cli patterns fix --dry-run

Programmatic¶

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()
result = copilot.run("Find unchecked return values", scenario="pattern_search")

for finding in result.get('findings', []):
    print(f"{finding['rule_id']}: {finding['file']}:{finding['line']}")
    print(f"  {finding['message']}")

Combining Scenarios¶

Run multiple scenarios together:

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()

# Force a specific scenario via context
result = copilot.run(
    "Analyze the executor module",
    context={"scenario_id": "scenario_2"}  # security
)
print(f"Answer: {result['answer']}")

# Or run the composite audit across all dimensions
# python -m src.cli audit --db PATH

Next Steps¶

TUI User Guide - General usage
API Reference - Programmatic access