Scenario 03: Documentation Generation

Technical writer generating API documentation, module overviews, and diagrams from code using CPG analysis.

Table of Contents

Quick Start

# Select Documentation Scenario
/select 03

How It Works

Intent Classification

The DocumentationIntentDetector classifies queries into one of 10 intents (+ fallback), sorted by priority. Each intent has EN and RU keywords with morphological matching.

Intent Priority Description
mvd_doc 3 Minimum Viable Documentation for project
module_overview 5 Module/subsystem overview
coverage_doc 7 Documentation coverage analysis
diagram_doc 8 Diagrams (call graph, component, dependency)
pipeline_doc 10 Processing pipeline documentation
business_logic_doc 12 Business logic extraction
type_doc 14 Type documentation and conversions
function_doc 20 Single function documentation
usage_example 30 Usage examples
api_reference 40 API reference

If no intent matches, the fallback general_documentation is used (confidence=0.5).

The detector also extracts: - target — function or module name from the query (via regex patterns and semantic mappings) - format — output format: markdown (default), html, rst, latex - diagram_type — for diagram_doc: component, dependency, or callgraph

Two-Phase Architecture

S03 uses a two-phase approach:

Phase 1: Handler-based (no LLM). The integrate_handlers() function tries 8 registered template-based handlers. Each handler produces structured documentation directly from CPG data using DocumentationReportFormatter. If a handler matches the detected intent and finds results, the response is returned without calling the LLM.

Phase 2: LLM fallback. If no handler matched, the full pipeline runs: 4-phase CPG retrieval collects methods, CallGraphAnalyzer enriches with usage patterns, then LLM generates documentation with graph context.

Query -> DocumentationIntentDetector -> integrate_handlers()
  |                                          |
  |  Phase 1: Handler matched?      Yes -> Structured report (no LLM)
  |                                 No  -> Phase 2: Full pipeline
  |
  Phase 2: CPG Retrieval (4 phases) -> CallGraphAnalyzer
           -> LLM with graph context -> Documentation

Eight Documentation Handlers

The 8 handlers are registered in HandlerRegistry("documentation"):

Handler Priority Intent
MVDDocHandler 3 mvd_doc
CoverageDocHandler 7 coverage_doc
DiagramDocHandler 8 diagram_doc
FunctionDocHandler 10 function_doc
BusinessLogicDocHandler 12 business_logic_doc
TypeDocHandler 14 type_doc
PipelineDocHandler 15 pipeline_doc
ModuleOverviewHandler 20 module_overview

All handlers inherit from DocumentationHandler (extends BaseHandler), which provides incremental documentation session tracking — _get_doc_session(), _update_doc_session(), and _get_session_delta() allow tracking which functions/modules have been documented across multiple queries.

CPG Retrieval Pipeline

The LLM fallback phase uses a 4-phase CPG retrieval pipeline to find relevant methods:

Phase 1: Direct Function Lookup

extract_function_names_from_query() extracts function names using 8 regex patterns:

  1. Backtick quoted: `funcName`
  2. Function call syntax: funcName(
  3. “function X” / “method X” phrases
  4. “X function” reversed order
  5. “for X” / “of X” patterns
  6. CamelCase identifiers (e.g., ExecInitNode)
  7. snake_case identifiers (e.g., heap_insert)
  8. Domain-specific naming conventions from plugin

Found names are queried directly in nodes_method for exact matches.

If no direct results, _get_documentation_keywords() loads keyword-to-pattern mappings from the domain plugin and searches by matched patterns using search_methods_by_name_pattern().

If still no results, subsystem keywords and aliases are loaded from the domain plugin. The query is matched against subsystem names using keyword_match_morphological(), then get_methods_by_subsystem() retrieves methods.

Phase 4: Keyword Fallback

As a last resort, _extract_significant_words() extracts meaningful words from the query (filtering stop words) and searches by each word individually.

CallGraphAnalyzer Integration

After methods are retrieved, CallGraphAnalyzer(cpg) from src/analysis/ enriches each method with usage pattern analysis:

  • find_all_callers(method_name, max_depth) — who uses this method
  • find_all_callees(method_name, max_depth) — what this method calls
  • analyze_impact(method_name) — importance score for documentation priority

The analyzer identifies: - Key methods — methods with high impact_score are prioritized for thorough documentation - Entry points — methods with no callers or matching entry point patterns from the domain plugin - Public API — methods with many callers are flagged as public API - Call examples — direct callers provide usage context for documentation

Documentation Types

Function Documentation

Generates API documentation for individual functions with signature, parameters, description, usage examples, and related functions (callers/callees from call graph).

Module Overview

Generates architecture documentation for modules/subsystems including component structure, key functions, data flow, and inter-module dependencies.

MVD (Minimum Viable Documentation)

Generates project-level Minimum Viable Documentation — a structured overview covering architecture, key modules, entry points, and API surface.

Coverage Report

Analyzes documentation coverage — identifies undocumented functions, calculates comment coverage ratio, and highlights documentation gaps.

Diagrams

Generates Mermaid diagrams from CPG data: - Call graph — function call relationships - Component diagram — subsystem/module architecture - Dependency diagram — file/module dependency graph

The diagram_type is auto-detected from the query (component, dependency, or callgraph).

Pipeline Documentation

Documents data processing pipelines and execution flows — traces query processing from input to output through the call graph.

Business Logic Extraction

Extracts business rules, invariants, and validation logic from code — identifies conditional logic, assertions, and constraint checks.

Type Documentation

Documents type systems, type conversions, and cast operations — analyzes type flow through function signatures and return types.

Export Formats

Documentation can be exported to external platforms:

  • GitBookGitBookExporter(output_dir) generates SUMMARY.md with chapter structure and individual markdown files
  • DocusaurusDocusaurusExporter(output_dir) generates MDX files with frontmatter and sidebar configuration

The DocumentationReportFormatter provides 8 format methods matching each documentation type: format_function_documentation, format_module_overview, format_pipeline_documentation, format_coverage_report, format_diagram_documentation, format_type_documentation, format_mvd_document, format_business_logic_report.

CLI Usage

# Generate function documentation
python -m src.cli query "Generate documentation for heap_insert"

# Module overview
python -m src.cli query "Document the executor module architecture"

# Documentation coverage
python -m src.cli query "Show documentation coverage for the storage subsystem"

# Generate diagrams
python -m src.cli query "Generate call graph diagram for the parser module"

# Full documentation generation
python -m src.cli.generate_docs full --output ./docs/generated --language en

# Generate specific section
python -m src.cli.generate_docs section mvd_doc --language en

# Export to GitBook/Docusaurus
python -m src.cli.export_docs --format gitbook --output ./gitbook-docs
python -m src.cli.export_docs --format docusaurus --output ./docusaurus-docs

Example Questions

  • “Generate documentation for [function_name]”
  • “Document the [module] architecture”
  • “Show documentation coverage for [subsystem]”
  • “Generate call graph diagram for [module]”
  • “Generate Minimum Viable Documentation for the project”
  • “Extract business logic from [module]”
  • “Document type conversions in [subsystem]”
  • “Explain the query processing pipeline”
  • “Show usage examples for [function]”
  • “List all public API functions in [module]”

S03 vs S01 vs S11: S03 generates documentation artifacts (API docs, module overviews, diagrams, coverage reports) from code. S01 (onboarding) helps developers explore and understand unfamiliar codebases interactively. S11 (architecture) focuses on structural analysis — dependency cycles, layer violations, coupling metrics. All three use CPG data and CallGraphAnalyzer but serve different purposes: S03 produces documentation, S01 answers exploration questions, S11 detects architectural issues.