User Guide

User Guide

Complete guide to using CodeGraph for code analysis.

Table of Contents

Overview

CodeGraph answers natural language questions about codebases by combining: - Semantic search - Find code by meaning and intent - Structural search - Traverse call graphs and data flow - LLM synthesis - Generate human-readable answers

Basic Usage

Interactive Mode

python examples/demo_simple.py

Enter questions at the prompt:

> What does CommitTransaction do?
> Find methods that handle memory allocation
> Show the call chain from executor to storage

Programmatic Usage

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()
question = "What methods handle transaction commits?"
result = copilot.run(question)

print(f"Answer: {result['answer']}")
print(f"Confidence: {result.get('confidence', 'N/A')}")

Question Types

Definition Queries

Find where code is defined:

Find method 'heap_insert'
Where is AbortTransaction defined?
Show me the RelationGetBufferForTuple function

Relationship Queries

Understand code relationships:

What methods call LWLockAcquire?
Find callers of MemoryContextCreate
What does heap_insert call?

Semantic Queries

Ask about behavior and purpose:

How does PostgreSQL handle MVCC?
Explain the transaction commit process
What mechanism ensures durability?

Security Queries

Find vulnerabilities:

Find potential SQL injection points
Show unsanitized user input paths
Find buffer overflow risks

Understanding Results

Result Structure

{
    "answer": "CommitTransaction finalizes a transaction by...",
    "confidence": 0.85,
    "sources": [
        {"method": "CommitTransaction", "file": "xact.c", "line": 1234},
        {"method": "CommitTransactionCommand", "file": "xact.c", "line": 1456}
    ],
    "query_used": "cpg.method.name('CommitTransaction')...",
    "execution_time_ms": 150
}

Confidence Levels

Level Meaning
> 0.9 High confidence - direct match
0.7-0.9 Good confidence - semantic match
0.5-0.7 Moderate - inference required
< 0.5 Low - best effort answer

Advanced Features

Hybrid Search Mode

Combine semantic and structural search:

from src.agents.retriever_agent import RetrieverAgent

retriever = RetrieverAgent(
    enable_hybrid=True,
    vector_weight=0.6,
    graph_weight=0.4
)

results = retriever.retrieve_hybrid(
    question="Find memory allocation patterns",
    mode="hybrid",
    query_type="structural"
)

Multi-Domain Analysis

Switch between codebases:

from src.config import CPGConfig

# Analyze PostgreSQL
pg_config = CPGConfig()
pg_config.set_cpg_type("postgresql")

# Analyze Linux Kernel
lk_config = CPGConfig()
lk_config.set_cpg_type("linux_kernel")

Scenario-Based Analysis

Use the copilot for scenario-based analysis (intent is detected automatically):

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()

# Security analysis - intent detected automatically
result = copilot.run("Find SQL injection vulnerabilities")
print(f"Intent: {result.get('intent')}")  # → 'security'

# Performance analysis
result = copilot.run("Find functions with high cyclomatic complexity")
print(f"Intent: {result.get('intent')}")  # → 'performance'

Best Practices

Writing Effective Questions

Good questions: - “What functions handle memory allocation in the buffer manager?” - “Show the call path from parser to executor” - “Find unsanitized inputs that reach database queries”

Less effective: - “Tell me about the code” (too vague) - “Fix this bug” (action request, not analysis) - “Everything about transactions” (too broad)

Optimizing Performance

  1. Be specific - Narrow questions get faster answers
  2. Use structural queries - When you know the pattern
  3. Enable caching - For repeated similar queries
  4. Limit scope - Add file or subsystem constraints

Interpreting Answers

  1. Check sources - Verify the code references
  2. Consider confidence - Lower confidence = verify manually
  3. Follow up - Ask clarifying questions
  4. Cross-reference - Compare with actual code

Workflow Integration

CI/CD Integration

# .github/workflows/code-analysis.yml
- name: Run Code Analysis
  run: |
    python -c "
    from src.workflow import MultiScenarioCopilot
    copilot = MultiScenarioCopilot()
    result = copilot.run('Find potential security issues')
    if result.get('critical_count', 0) > 0:
        exit(1)
    "

Code Review

# Analyze a patch
python examples/demo_patch_review.py --patch changes.diff

# Output: Security, performance, and architecture findings

Documentation Generation

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()
result = copilot.run("Document the transaction subsystem")

# result['answer'] contains generated documentation
print(result['answer'])

Next Steps