User Guide¶

Complete guide to using CodeGraph for code analysis.

Table of Contents¶

Overview
Basic Usage
Interactive Mode
Programmatic Usage
Question Types
Definition Queries
Relationship Queries
Semantic Queries
Security Queries
Understanding Results
Result Structure
Confidence Levels
Advanced Features
Hybrid Search Mode
Multi-Domain Analysis
Scenario-Based Analysis
Best Practices
Writing Effective Questions
Optimizing Performance
Interpreting Answers
Workflow Integration
CI/CD Integration
Code Review
Documentation Generation
Next Steps

Overview¶

CodeGraph answers natural language questions about codebases by combining: - Semantic search - Find code by meaning and intent - Structural search - Traverse call graphs and data flow - LLM synthesis - Generate human-readable answers

Basic Usage¶

Interactive Mode¶

python examples/demo_simple.py

Enter questions at the prompt:

> What does CommitTransaction do?
> Find methods that handle memory allocation
> Show the call chain from executor to storage

Programmatic Usage¶

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()
question = "What methods handle transaction commits?"
result = copilot.run(question)

print(f"Answer: {result['answer']}")
print(f"Confidence: {result.get('confidence', 'N/A')}")

Question Types¶

Definition Queries¶

Find where code is defined:

Find method 'heap_insert'
Where is AbortTransaction defined?
Show me the RelationGetBufferForTuple function

Relationship Queries¶

Understand code relationships:

What methods call LWLockAcquire?
Find callers of MemoryContextCreate
What does heap_insert call?

Semantic Queries¶

Ask about behavior and purpose:

How does PostgreSQL handle MVCC?
Explain the transaction commit process
What mechanism ensures durability?

Security Queries¶

Find vulnerabilities:

Find potential SQL injection points
Show unsanitized user input paths
Find buffer overflow risks

Understanding Results¶

Result Structure¶

{
    "answer": "CommitTransaction finalizes a transaction by...",
    "confidence": 0.85,
    "sources": [
        {"method": "CommitTransaction", "file": "xact.c", "line": 1234},
        {"method": "CommitTransactionCommand", "file": "xact.c", "line": 1456}
    ],
    "query_used": "cpg.method.name('CommitTransaction')...",
    "execution_time_ms": 150
}

Confidence Levels¶

Level	Meaning
> 0.9	High confidence - direct match
0.7-0.9	Good confidence - semantic match
0.5-0.7	Moderate - inference required
< 0.5	Low - best effort answer

Advanced Features¶

Hybrid Search Mode¶

Combine semantic and structural search:

from src.agents.retriever_agent import RetrieverAgent

retriever = RetrieverAgent(
    enable_hybrid=True,
    vector_weight=0.6,
    graph_weight=0.4
)

results = retriever.retrieve_hybrid(
    question="Find memory allocation patterns",
    mode="hybrid",
    query_type="structural"
)

Multi-Domain Analysis¶

Switch between codebases:

from src.config import CPGConfig

# Analyze PostgreSQL
pg_config = CPGConfig()
pg_config.set_cpg_type("postgresql")

# Analyze Go project
go_config = CPGConfig()
go_config.set_cpg_type("go")

Scenario-Based Analysis¶

Use the copilot for scenario-based analysis (intent is detected automatically):

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()

# Security analysis - intent detected automatically
result = copilot.run("Find SQL injection vulnerabilities")
print(f"Intent: {result.get('intent')}")  # → 'security'

# Performance analysis
result = copilot.run("Find functions with high cyclomatic complexity")
print(f"Intent: {result.get('intent')}")  # → 'performance'

Structural Pattern Search¶

Search for code patterns using tree-sitter CST matching with CPG-aware constraints:

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()

# Pattern search - intent detected automatically
result = copilot.run("Find unchecked return values")
print(f"Intent: {result.get('intent')}")  # → 'pattern_search'

# Force pattern search scenario
result = copilot.run(
    "Find malloc without free",
    context={"scenario_id": "scenario_21"}
)

Using the Patterns CLI Programmatically¶

from src.services.gocpg import GoCPGClient

client = GoCPGClient()

# Ad-hoc pattern search (no CPG DB needed)
results = await client.search(pattern="malloc($x)", language="c", max_results=50)

# CPG-aware scan with rules
results = await client.scan(db_path="data/projects/postgres.duckdb", rule_id="unchecked-return")

Pattern Findings via CPG Query Service¶

from src.services.cpg import CPGQueryService

cpg = CPGQueryService()

# Query persisted pattern findings
findings = cpg.get_pattern_findings(severity="high")
stats = cpg.get_pattern_stats()

Best Practices¶

Writing Effective Questions¶

Good questions: - “What functions handle memory allocation in the buffer manager?” - “Show the call path from parser to executor” - “Find unsanitized inputs that reach database queries”

Less effective: - “Tell me about the code” (too vague) - “Fix this bug” (action request, not analysis) - “Everything about transactions” (too broad)

Optimizing Performance¶

Be specific - Narrow questions get faster answers
Use structural queries - When you know the pattern
Enable caching - For repeated similar queries
Limit scope - Add file or subsystem constraints

Interpreting Answers¶

Check sources - Verify the code references
Consider confidence - Lower confidence = verify manually
Follow up - Ask clarifying questions
Cross-reference - Compare with actual code

Workflow Integration¶

CI/CD Integration¶

# .github/workflows/code-analysis.yml
- name: Run Code Analysis
  run: |
    python -c "
    from src.workflow import MultiScenarioCopilot
    copilot = MultiScenarioCopilot()
    result = copilot.run('Find potential security issues')
    if result.get('critical_count', 0) > 0:
        exit(1)
    "

Code Review¶

# Analyze a patch
python examples/demo_patch_review.py --patch changes.diff

# Output: Security, performance, and architecture findings

Documentation Generation¶

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()
result = copilot.run("Document the transaction subsystem")

# result['answer'] contains generated documentation
print(result['answer'])

Next Steps¶

Scenarios - All 21 analysis scenarios
CLI Guide - Command-line interface
API Reference - Programmatic access
Troubleshooting - Common issues