Workflow Scenarios Guide

Workflow Scenarios Guide

CodeGraph supports 21 specialized analysis scenarios.

Table of Contents

Scenario Overview

# Scenario Use Case
1 Codebase Onboarding Navigate the codebase for new developers
2 Security Audit Comprehensive audit with taint analysis
3 Documentation Generation Auto-generate technical documentation
4 Feature Development Guidance for implementing new features
5 Refactoring Refactoring recommendations with impact analysis
6 Performance Analysis Identify performance bottlenecks
7 Test Coverage Test coverage analysis and recommendations
8 Compliance Verification OWASP, GDPR, ISO 27001 compliance checks
9 Code Review Automated PR/MR review
10 Cross-Repository Analysis Cross-module dependency analysis
11 Architecture Analysis Detect architectural constraint violations
12 Tech Debt Assessment Quantify technical debt
13 Mass Refactoring Automated mass refactoring (API migrations)
14 Security Incident Response Incident investigation with recommendations
15 Debugging Support Data-flow-based debugging assistance
16 Entry Points & Attack Surface API entry points and attack surface analysis
17 File Editing AST-based precise code editing
18 Code Optimization Comprehensive optimization (security, refactoring, architecture)
19 Standards Check Code standards verification
20 Dependency Analysis Dependency and import analysis
21 Structural Pattern Search Find code patterns with CPG constraints

1. Codebase Onboarding

Navigate the codebase for new developers: find function/class/struct definitions, explore architecture.

Example Questions

Explain the project architecture
Find method 'heap_insert'
Where is AbortTransaction defined?
Show the definition of RelFileNode struct
Find all methods in file 'xact.c'

Usage

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()
result = copilot.run("Explain the project architecture", scenario="onboarding")

print(result['answer'])

2. Security Audit

Comprehensive security audit: vulnerability detection, taint analysis, call graph analysis, CWE mapping.

Example Questions

Find SQL injection vulnerabilities
Show potential buffer overflows
Find unsanitized user input
Trace user input to query execution
What functions call LWLockAcquire?

Security Patterns Detected

  • SQL Injection (CWE-89)
  • Buffer Overflow (CWE-120)
  • Command Injection (CWE-78)
  • Format String (CWE-134)
  • Integer Overflow (CWE-190)

Usage

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()
result = copilot.run("Find SQL injection vulnerabilities")

for vuln in result['vulnerabilities']:
    print(f"CWE-{vuln['cwe']}: {vuln['description']}")
    print(f"  File: {vuln['file']}:{vuln['line']}")
    print(f"  Severity: {vuln['severity']}")

3. Documentation Generation

Auto-generate technical documentation from source code.

Example Questions

Generate API documentation
Document the transaction subsystem
Create a summary of the buffer manager

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Document the transaction subsystem")

print(result['documentation'])
# Markdown-formatted documentation

4. Feature Development

Guidance for implementing new features: placement recommendations, pattern examples, dependency navigation.

Example Questions

Where should I add a new endpoint?
Where should I place new cache invalidation feature?
Find similar features to buffer management
Show pattern examples for executor subsystem

Features

  • Optimal placement: Recommends the best file and nearby method for new code
  • Pattern examples: Classifies existing methods (Initialization, Handler, Query, Validation, Cleanup patterns)
  • Confidence scoring: Indicates how well the feature description matches the recommended subsystem

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Where should I place new cache invalidation feature?")
print(result['answer'])

See Feature Development Scenario for detailed examples.


5. Refactoring

Refactoring recommendations with impact analysis: dead code, duplication, extract-method opportunities.

Example Questions

Find unreachable functions
Find duplicate code blocks
Plan refactoring of the buffer manager
Find functions to split

Detected Patterns

  • Functions with no callers (dead code)
  • Unreachable code blocks
  • Copy-paste duplication
  • Long functions for method extraction

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Find unused functions")

for dead in result['dead_code']:
    print(f"Unused: {dead['name']} in {dead['file']}")

6. Performance Analysis

Identify performance bottlenecks: cyclomatic complexity, expensive loops, concurrency, memory allocation patterns.

Example Questions

Find N+1 database queries
Find functions with high cyclomatic complexity
Find O(n^2) patterns
Find race conditions
Show lock ordering issues

Metrics Analyzed

  • Cyclomatic complexity
  • Loop nesting depth
  • Function length and call frequency
  • Memory allocation patterns
  • Thread safety (locks, race conditions)

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Find functions with complexity > 20")

for func in result['complex_functions']:
    print(f"{func['name']}: complexity={func['complexity']}")

7. Test Coverage

Test coverage analysis and recommendations. Supports importing runtime coverage data from external tools.

Example Questions

Which functions are not covered by tests?
Find untested code
Show functions without tests

Importing Coverage Data

# Import pytest-cov JSON report
python -m src.cli coverage import --file coverage.json --format pytest-cov --db data/projects/postgres.duckdb

# Import lcov trace file
python -m src.cli coverage import --file coverage.lcov --format lcov

# Import Cobertura XML (Java/C#)
python -m src.cli coverage import --file coverage.xml --format cobertura --source-root /project

After importing, the “Find untested code” query automatically switches to hybrid mode, combining runtime coverage_percent values with heuristic test-caller analysis.

CPG-Based Test Recommendations

  • Branch coverage: Counts IF/FOR/SWITCH control structures and estimates required test cases
  • Parameter boundaries: Maps parameter types to boundary value suggestions (zero, null, empty, max)
  • Error paths: Counts TRY blocks and multiple RETURN statements indicating error handling

8. Compliance Verification

Verify code compliance with standards: OWASP Top 10, GDPR, ISO 27001.

Example Questions

Check compliance with OWASP Top 10
Verify GDPR data handling requirements
Check ISO 27001 compliance

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Check compliance with OWASP Top 10")
print(result['answer'])

9. Code Review

Automated code review for PR/MR.

Example Questions

Review this PR for issues
Find potential bugs in this change
Check for style violations
Analyze test coverage for changes

Usage

python demo_patch_review.py --patch changes.diff

Or programmatically:

copilot = MultiScenarioCopilot()
result = copilot.run("Review changes in path/to/changes.diff")

for finding in result['findings']:
    print(f"{finding['severity']}: {finding['description']}")

10. Cross-Repository Analysis

Cross-module dependency analysis, duplication between repositories.

Example Questions

Find duplicate code between repo A and B
Show cross-module dependencies

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Find duplicate code between repo A and B")
print(result['answer'])

11. Architecture Analysis

Detect architectural constraint violations, analyze subsystems, layers, and dependencies.

Example Questions

Find circular dependencies
Map the subsystem architecture
Show layer boundaries
Find architectural violations
Show subsystem diagram

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Map the PostgreSQL architecture")

for subsystem in result['subsystems']:
    print(f"{subsystem['name']}: {subsystem['description']}")

12. Tech Debt Assessment

Quantify technical debt.

Example Questions

Assess technical debt of this module
Find code with excessive coupling
Show modules needing refactoring
Identify maintenance hotspots

Debt Indicators

  • High complexity
  • Deep nesting
  • Long functions
  • High coupling
  • Missing error handling

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Find technical debt hotspots")

for debt in result['debt_items']:
    print(f"{debt['location']}: {debt['type']} (severity: {debt['severity']})")

13. Mass Refactoring

Automated mass refactoring: API migrations, renames.

Example Questions

Plan migration from v1 to v2 API
Rename function X to Y across all files

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Plan migration from v1 to v2 API")
print(result['answer'])

14. Security Incident Response

Investigate security incidents: call-path tracing from entry points to vulnerabilities, CVE impact analysis, blast radius calculation, Mermaid attack path diagrams, taint flow analysis.

Example Questions

Trace the impact of CVE-XXXX
Find all code paths affected by this vulnerability
Show exploitation paths
Trace attack paths to vulnerable function
Find entry points that reach parse_query
Identify affected functions

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Trace impact of vulnerability in parse_query")

# Attack paths from entry points to vulnerability
for path in result['metadata'].get('attack_paths', []):
    print(f"{path.entry_point} -> {path.vulnerability} (chain: {path.chain_length})")

15. Debugging Support

Data-flow-based debugging assistance.

Example Questions

Find all elog(ERROR) locations
Show potential deadlocks
Find missing lock acquisitions

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Find all elog(ERROR) locations")
print(result['answer'])

16. Entry Points & Attack Surface

API entry points and attack surface analysis: exported functions, hook functions.

Example Questions

Which functions accept user input?
Find all exported functions
Show main API entry points
Find hook functions

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Find API entry points")

for entry in result['entry_points']:
    print(f"Entry: {entry['name']} ({entry['type']})")

17. File Editing

AST-based precise code editing.

Example Questions

Rename function X to Y across all files

18. Code Optimization

Comprehensive optimization: composite scenario that runs sub-scenarios S02, S05, S06, S11, S12 in parallel (60s timeout).

Example Questions

Optimize the authorization module

19. Standards Check

Code standards verification: composite scenario that runs S08, S17, S18 sequentially (45s timeout).

Example Questions

Check code against project standards

20. Dependency Analysis

Dependency and import analysis: module dependency graph.

Example Questions

Show the module dependency tree
What modules depend on storage?
Find circular dependencies

Usage

copilot = MultiScenarioCopilot()
result = copilot.run("Show dependencies of transaction module")

for dep in result['dependencies']:
    print(f"{dep['from']} -> {dep['to']}")

Find code matching structural patterns with CPG-aware constraints (data flow, call graph, types, domain annotations).

Example Questions

Find unchecked return values
Find malloc without free
Show functions matching error-handling anti-patterns
Find SQL query construction without parameterization
Find all functions with cyclomatic complexity > 20

Pattern Types

  • Syntactic: Tree-sitter CST patterns with metavariables ($VAR, $$ARGS, $_)
  • CPG-constrained: Patterns with data flow, call graph, type, and domain constraints
  • YAML rules: Pre-defined rules in configs/rules/ (190 rules across 14 languages)

Usage

CLI

# Ad-hoc pattern search
python -m src.cli patterns search "malloc($x)" --lang c

# Scan with all rules
python -m src.cli patterns scan

# Scan specific rule
python -m src.cli patterns scan --rule unchecked-return

# Generate rule from description
python -m src.cli patterns generate "find unchecked return values" --lang c

# Autofix (dry run)
python -m src.cli patterns fix --dry-run

Programmatic

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()
result = copilot.run("Find unchecked return values", scenario="pattern_search")

for finding in result.get('findings', []):
    print(f"{finding['rule_id']}: {finding['file']}:{finding['line']}")
    print(f"  {finding['message']}")

Combining Scenarios

Run multiple scenarios together:

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()

# Force a specific scenario via context
result = copilot.run(
    "Analyze the executor module",
    context={"scenario_id": "scenario_2"}  # security
)
print(f"Answer: {result['answer']}")

# Or run the composite audit across all dimensions
# python -m src.cli audit --db PATH

Next Steps