User Guide¶
Complete guide to using CodeGraph for code analysis.
Table of Contents¶
- Overview
- Basic Usage
- Interactive Mode
- Programmatic Usage
- Question Types
- Definition Queries
- Relationship Queries
- Semantic Queries
- Security Queries
- Understanding Results
- Result Structure
- Confidence Levels
- Advanced Features
- Hybrid Search Mode
- Multi-Domain Analysis
- Scenario-Based Analysis
- Best Practices
- Writing Effective Questions
- Optimizing Performance
- Interpreting Answers
- Workflow Integration
- CI/CD Integration
- Code Review
- Documentation Generation
- Next Steps
Overview¶
CodeGraph answers natural language questions about codebases by combining: - Semantic search - Find code by meaning and intent - Structural search - Traverse call graphs and data flow - LLM synthesis - Generate human-readable answers
Basic Usage¶
Interactive Mode¶
python examples/demo_simple.py
Enter questions at the prompt:
> What does CommitTransaction do?
> Find methods that handle memory allocation
> Show the call chain from executor to storage
Programmatic Usage¶
from src.workflow import MultiScenarioCopilot
copilot = MultiScenarioCopilot()
question = "What methods handle transaction commits?"
result = copilot.run(question)
print(f"Answer: {result['answer']}")
print(f"Confidence: {result.get('confidence', 'N/A')}")
Question Types¶
Definition Queries¶
Find where code is defined:
Find method 'heap_insert'
Where is AbortTransaction defined?
Show me the RelationGetBufferForTuple function
Relationship Queries¶
Understand code relationships:
What methods call LWLockAcquire?
Find callers of MemoryContextCreate
What does heap_insert call?
Semantic Queries¶
Ask about behavior and purpose:
How does PostgreSQL handle MVCC?
Explain the transaction commit process
What mechanism ensures durability?
Security Queries¶
Find vulnerabilities:
Find potential SQL injection points
Show unsanitized user input paths
Find buffer overflow risks
Understanding Results¶
Result Structure¶
{
"answer": "CommitTransaction finalizes a transaction by...",
"confidence": 0.85,
"sources": [
{"method": "CommitTransaction", "file": "xact.c", "line": 1234},
{"method": "CommitTransactionCommand", "file": "xact.c", "line": 1456}
],
"query_used": "cpg.method.name('CommitTransaction')...",
"execution_time_ms": 150
}
Confidence Levels¶
| Level | Meaning |
|---|---|
| > 0.9 | High confidence - direct match |
| 0.7-0.9 | Good confidence - semantic match |
| 0.5-0.7 | Moderate - inference required |
| < 0.5 | Low - best effort answer |
Advanced Features¶
Hybrid Search Mode¶
Combine semantic and structural search:
from src.agents.retriever_agent import RetrieverAgent
retriever = RetrieverAgent(
enable_hybrid=True,
vector_weight=0.6,
graph_weight=0.4
)
results = retriever.retrieve_hybrid(
question="Find memory allocation patterns",
mode="hybrid",
query_type="structural"
)
Multi-Domain Analysis¶
Switch between codebases:
from src.config import CPGConfig
# Analyze PostgreSQL
pg_config = CPGConfig()
pg_config.set_cpg_type("postgresql")
# Analyze Linux Kernel
lk_config = CPGConfig()
lk_config.set_cpg_type("linux_kernel")
Scenario-Based Analysis¶
Use the copilot for scenario-based analysis (intent is detected automatically):
from src.workflow import MultiScenarioCopilot
copilot = MultiScenarioCopilot()
# Security analysis - intent detected automatically
result = copilot.run("Find SQL injection vulnerabilities")
print(f"Intent: {result.get('intent')}") # → 'security'
# Performance analysis
result = copilot.run("Find functions with high cyclomatic complexity")
print(f"Intent: {result.get('intent')}") # → 'performance'
Best Practices¶
Writing Effective Questions¶
Good questions: - “What functions handle memory allocation in the buffer manager?” - “Show the call path from parser to executor” - “Find unsanitized inputs that reach database queries”
Less effective: - “Tell me about the code” (too vague) - “Fix this bug” (action request, not analysis) - “Everything about transactions” (too broad)
Optimizing Performance¶
- Be specific - Narrow questions get faster answers
- Use structural queries - When you know the pattern
- Enable caching - For repeated similar queries
- Limit scope - Add file or subsystem constraints
Interpreting Answers¶
- Check sources - Verify the code references
- Consider confidence - Lower confidence = verify manually
- Follow up - Ask clarifying questions
- Cross-reference - Compare with actual code
Workflow Integration¶
CI/CD Integration¶
# .github/workflows/code-analysis.yml
- name: Run Code Analysis
run: |
python -c "
from src.workflow import MultiScenarioCopilot
copilot = MultiScenarioCopilot()
result = copilot.run('Find potential security issues')
if result.get('critical_count', 0) > 0:
exit(1)
"
Code Review¶
# Analyze a patch
python examples/demo_patch_review.py --patch changes.diff
# Output: Security, performance, and architecture findings
Documentation Generation¶
from src.workflow import MultiScenarioCopilot
copilot = MultiScenarioCopilot()
result = copilot.run("Document the transaction subsystem")
# result['answer'] contains generated documentation
print(result['answer'])
Next Steps¶
- Scenarios - All 16 use cases
- CLI Guide - Command-line interface
- API Reference - Programmatic access
- Troubleshooting - Common issues