Technical writer generating API documentation, module overviews, and diagrams from code using CPG analysis.
Table of Contents¶
- Quick Start
- How It Works
- Intent Classification
- Two-Phase Architecture
- Eight Documentation Handlers
- CPG Retrieval Pipeline
- Phase 1: Direct Function Lookup
- Phase 2: Pattern-Based Search
- Phase 3: Subsystem-Based Search
- Phase 4: Keyword Fallback
- CallGraphAnalyzer Integration
- Documentation Types
- Function Documentation
- Module Overview
- MVD (Minimum Viable Documentation)
- Coverage Report
- Diagrams
- Pipeline Documentation
- Business Logic Extraction
- Type Documentation
- Export Formats
- CLI Usage
- Example Questions
- Related Scenarios
Quick Start¶
# Select Documentation Scenario
/select 03
How It Works¶
Intent Classification¶
The DocumentationIntentDetector classifies queries into one of 10 intents (+ fallback), sorted by priority. Each intent has EN and RU keywords with morphological matching.
| Intent | Priority | Description |
|---|---|---|
mvd_doc |
3 | Minimum Viable Documentation for project |
module_overview |
5 | Module/subsystem overview |
coverage_doc |
7 | Documentation coverage analysis |
diagram_doc |
8 | Diagrams (call graph, component, dependency) |
pipeline_doc |
10 | Processing pipeline documentation |
business_logic_doc |
12 | Business logic extraction |
type_doc |
14 | Type documentation and conversions |
function_doc |
20 | Single function documentation |
usage_example |
30 | Usage examples |
api_reference |
40 | API reference |
If no intent matches, the fallback general_documentation is used (confidence=0.5).
The detector also extracts:
- target — function or module name from the query (via regex patterns and semantic mappings)
- format — output format: markdown (default), html, rst, latex
- diagram_type — for diagram_doc: component, dependency, or callgraph
Two-Phase Architecture¶
S03 uses a two-phase approach:
Phase 1: Handler-based (no LLM). The integrate_handlers() function tries 8 registered template-based handlers. Each handler produces structured documentation directly from CPG data using DocumentationReportFormatter. If a handler matches the detected intent and finds results, the response is returned without calling the LLM.
Phase 2: LLM fallback. If no handler matched, the full pipeline runs: 4-phase CPG retrieval collects methods, CallGraphAnalyzer enriches with usage patterns, then LLM generates documentation with graph context.
Query -> DocumentationIntentDetector -> integrate_handlers()
| |
| Phase 1: Handler matched? Yes -> Structured report (no LLM)
| No -> Phase 2: Full pipeline
|
Phase 2: CPG Retrieval (4 phases) -> CallGraphAnalyzer
-> LLM with graph context -> Documentation
Eight Documentation Handlers¶
The 8 handlers are registered in HandlerRegistry("documentation"):
| Handler | Priority | Intent |
|---|---|---|
MVDDocHandler |
3 | mvd_doc |
CoverageDocHandler |
7 | coverage_doc |
DiagramDocHandler |
8 | diagram_doc |
FunctionDocHandler |
10 | function_doc |
BusinessLogicDocHandler |
12 | business_logic_doc |
TypeDocHandler |
14 | type_doc |
PipelineDocHandler |
15 | pipeline_doc |
ModuleOverviewHandler |
20 | module_overview |
All handlers inherit from DocumentationHandler (extends BaseHandler), which provides incremental documentation session tracking — _get_doc_session(), _update_doc_session(), and _get_session_delta() allow tracking which functions/modules have been documented across multiple queries.
CPG Retrieval Pipeline¶
The LLM fallback phase uses a 4-phase CPG retrieval pipeline to find relevant methods:
Phase 1: Direct Function Lookup¶
extract_function_names_from_query() extracts function names using 8 regex patterns:
- Backtick quoted:
`funcName` - Function call syntax:
funcName( - “function X” / “method X” phrases
- “X function” reversed order
- “for X” / “of X” patterns
- CamelCase identifiers (e.g.,
ExecInitNode) - snake_case identifiers (e.g.,
heap_insert) - Domain-specific naming conventions from plugin
Found names are queried directly in nodes_method for exact matches.
Phase 2: Pattern-Based Search¶
If no direct results, _get_documentation_keywords() loads keyword-to-pattern mappings from the domain plugin and searches by matched patterns using search_methods_by_name_pattern().
Phase 3: Subsystem-Based Search¶
If still no results, subsystem keywords and aliases are loaded from the domain plugin. The query is matched against subsystem names using keyword_match_morphological(), then get_methods_by_subsystem() retrieves methods.
Phase 4: Keyword Fallback¶
As a last resort, _extract_significant_words() extracts meaningful words from the query (filtering stop words) and searches by each word individually.
CallGraphAnalyzer Integration¶
After methods are retrieved, CallGraphAnalyzer(cpg) from src/analysis/ enriches each method with usage pattern analysis:
find_all_callers(method_name, max_depth)— who uses this methodfind_all_callees(method_name, max_depth)— what this method callsanalyze_impact(method_name)— importance score for documentation priority
The analyzer identifies:
- Key methods — methods with high impact_score are prioritized for thorough documentation
- Entry points — methods with no callers or matching entry point patterns from the domain plugin
- Public API — methods with many callers are flagged as public API
- Call examples — direct callers provide usage context for documentation
Documentation Types¶
Function Documentation¶
Generates API documentation for individual functions with signature, parameters, description, usage examples, and related functions (callers/callees from call graph).
Module Overview¶
Generates architecture documentation for modules/subsystems including component structure, key functions, data flow, and inter-module dependencies.
MVD (Minimum Viable Documentation)¶
Generates project-level Minimum Viable Documentation — a structured overview covering architecture, key modules, entry points, and API surface.
Coverage Report¶
Analyzes documentation coverage — identifies undocumented functions, calculates comment coverage ratio, and highlights documentation gaps.
Diagrams¶
Generates Mermaid diagrams from CPG data: - Call graph — function call relationships - Component diagram — subsystem/module architecture - Dependency diagram — file/module dependency graph
The diagram_type is auto-detected from the query (component, dependency, or callgraph).
Pipeline Documentation¶
Documents data processing pipelines and execution flows — traces query processing from input to output through the call graph.
Business Logic Extraction¶
Extracts business rules, invariants, and validation logic from code — identifies conditional logic, assertions, and constraint checks.
Type Documentation¶
Documents type systems, type conversions, and cast operations — analyzes type flow through function signatures and return types.
Export Formats¶
Documentation can be exported to external platforms:
- GitBook —
GitBookExporter(output_dir)generates SUMMARY.md with chapter structure and individual markdown files - Docusaurus —
DocusaurusExporter(output_dir)generates MDX files with frontmatter and sidebar configuration
The DocumentationReportFormatter provides 8 format methods matching each documentation type: format_function_documentation, format_module_overview, format_pipeline_documentation, format_coverage_report, format_diagram_documentation, format_type_documentation, format_mvd_document, format_business_logic_report.
CLI Usage¶
# Generate function documentation
python -m src.cli query "Generate documentation for heap_insert"
# Module overview
python -m src.cli query "Document the executor module architecture"
# Documentation coverage
python -m src.cli query "Show documentation coverage for the storage subsystem"
# Generate diagrams
python -m src.cli query "Generate call graph diagram for the parser module"
# Full documentation generation
python -m src.cli.generate_docs full --output ./docs/generated --language en
# Generate specific section
python -m src.cli.generate_docs section mvd_doc --language en
# Export to GitBook/Docusaurus
python -m src.cli.export_docs --format gitbook --output ./gitbook-docs
python -m src.cli.export_docs --format docusaurus --output ./docusaurus-docs
Example Questions¶
- “Generate documentation for [function_name]”
- “Document the [module] architecture”
- “Show documentation coverage for [subsystem]”
- “Generate call graph diagram for [module]”
- “Generate Minimum Viable Documentation for the project”
- “Extract business logic from [module]”
- “Document type conversions in [subsystem]”
- “Explain the query processing pipeline”
- “Show usage examples for [function]”
- “List all public API functions in [module]”
Related Scenarios¶
- Onboarding - Codebase exploration and understanding
- Architecture - Deeper structural and dependency analysis
- Feature Development - Finding integration points
S03 vs S01 vs S11: S03 generates documentation artifacts (API docs, module overviews, diagrams, coverage reports) from code. S01 (onboarding) helps developers explore and understand unfamiliar codebases interactively. S11 (architecture) focuses on structural analysis — dependency cycles, layer violations, coupling metrics. All three use CPG data and CallGraphAnalyzer but serve different purposes: S03 produces documentation, S01 answers exploration questions, S11 detects architectural issues.