Frequently Asked Questions (FAQ)

Detailed answers about CodeGraph and GoCPG architecture and subsystems.

Table of Contents


Orchestration and Routing

How does scenario orchestration work?

Orchestration is implemented via a LangGraph graph with three nodes: classify_intent_nodepre_retrievalroute_by_intent. The MultiScenarioCopilot class (src/workflow/orchestration/copilot.py) builds this graph and executes it via run(query). At the classify_intent_node step, an LLM provider (Yandex AI, GigaChat, or OpenAI) determines the user’s query type. Then route_by_intent (src/workflow/orchestration/router.py) selects one of 21 scenarios for processing.

What scenarios does the system support?

The system supports 21 scenarios: S01 (onboarding), S02 (security), S03 (documentation), S04 (feature development), S05 (refactoring), S06 (performance), S07 (test coverage), S08 (compliance), S09 (code review), S10 (cross-repo analysis), S11 (architecture), S12 (tech debt), S13 (mass refactoring), S14 (security incident), S15 (debugging), S16 (entry points), S17 (file editing), S18 (code optimization — composite), S19 (standards check — composite), S20 (dependencies), S21 (structural pattern search).

What is route_by_intent and how does it work?

The route_by_intent function (src/workflow/orchestration/router.py:16) takes a MultiScenarioState with an intent field (populated during classification) and returns the name of the next workflow node. Internally, it uses a routing_map dictionary that maps intent values to handler function names (e.g., "onboarding""onboarding_workflow", "tech_debt""tech_debt_workflow"). It supports composite mode for complex queries.

What is MultiScenarioCopilot?

MultiScenarioCopilot (src/workflow/orchestration/copilot.py) is the system’s central orchestrator. It builds the LangGraph graph, runs intent classification and routing, then hands off control to the appropriate workflow handler. The run(query) method returns a state dict with fields query, intent, scenario_id, confidence, answer, evidence, and metadata.

What is classify_intent_node?

The classify_intent_node (src/workflow/orchestration/intent.py) performs LLM-based classification of the user’s intent. It takes the query text as input and populates intent, scenario_id, and confidence fields in the MultiScenarioState. It uses a prompt from config/prompts/{lang}/workflow_prompts.yaml listing all scenarios.

What is the pre_retrieval node?

The pre_retrieval node (src/workflow/orchestration/pre_retrieval.py) is an optional step between classification and routing (Phase E). It runs HybridRetriever.retrieve() to pre-fetch context from ChromaDB and DuckDB. Results are stored in state["pre_retrieval_results"]. Enabled via config.yamlworkflows.pre_retrieval.enable: true (on by default).


Authorization and Security

How is API authorization implemented?

Authorization is implemented through several mechanisms: JWT tokens, API keys, and OAuth 2.0. The central FastAPI dependency — get_auth_context (src/api/auth/middleware.py) — extracts and validates user data from each incoming request. The result is placed in an AuthContext container with fields user_id, username, role, scopes, and auth_method.

How does permission checking work?

The has_permission function (src/api/auth/middleware.py) checks whether a user has a specific permission based on their role (VIEWER, ANALYST, REVIEWER, ADMIN). Roles and permissions are defined in src/api/database/models.py. The system uses RBAC (Role-Based Access Control) with extensibility via scopes.

What OAuth providers are supported?

Two OAuth providers are supported: SourceCraft (via Yandex ID, SourceCraftOAuth in src/api/auth/oauth.py) and GitVerse (via Sber ID, GitVerseOAuth). Both inherit a shared interface and support get_authorization_url(), exchange_code(), and get_user_info().

How does MCP server authentication work?

The MCP server (src/mcp/auth.py) supports Bearer JWT and API keys for SSE/HTTP transports. Authentication is not required for stdio transport. The --no-auth flag disables checks for development.

What attack protection mechanisms are implemented?

The system includes: path neutralization (db_path from request body is ignored), IDOR protection (owner_user_id verification), webhook replay protection (timestamp checks), OAuth CSRF protection (one-time state tokens), rate limiting (three-tier rate limiter), SQL query complexity validation, and DLP filtering.

How does token blacklisting work?

The is_token_blacklisted function checks whether a JWT token has been revoked. The blacklist is synchronized via _blacklist_sync_task() with a configurable interval (security.token_blacklist_sync_interval_seconds). Revoked tokens are blocked until expiry.


Code Property Graph (CPG)

What is a Code Property Graph (CPG)?

A CPG (Code Property Graph) is a unified representation of source code as a graph containing the Abstract Syntax Tree (AST), Control Flow Graph (CFG), and Data Flow Graph (DFG). In CodeGraph, the CPG is stored in DuckDB with tables nodes_method, nodes_type, edges_call, call_containment, edges_inherits_from, and others.

How is CPGQueryService structured?

CPGQueryService (src/services/cpg/) is the main client for working with the CPG through DuckDB. It uses a mixin pattern: base class CPGQueryBase (src/services/cpg/base.py) plus 11 query mixins (SubsystemQueries, CallGraph, Security, Performance, Quality, Semantic, Statistics, Comment, External, Type, Pattern). The DB path is resolved via ProjectManager.get_active_db_path().

What is call_containment?

The call_containment view in DuckDB links function calls to their containing methods. Fields: containing_method_name (calling method) and callee_name (called method). Used for building call graphs in MechanismExplanationHandler.

What does the is_test field in nodes_method mean?

The boolean is_test field indicates whether a method is a test function (determined by GoCPG based on naming and location). Used to filter out test functions from analysis results for non-test queries.


GoCPG — CPG Generator

What is GoCPG?

GoCPG (gocpg/) is a Code Property Graph generator written in Go. It supports 11 programming languages: C, C++, Go, Python, JavaScript, TypeScript, Java, Kotlin, C#, PHP, and 1C:Enterprise. Uses tree-sitter for parsing, requires CGO_ENABLED=1.

How to run GoCPG?

cd gocpg
CGO_ENABLED=1 go build -o gocpg.exe ./cmd/gocpg  # Build
./gocpg parse --input=/path/to/source --output=out.duckdb --lang=python  # Parse
./gocpg stats --db=out.duckdb  # Stats
./gocpg query --db=out.duckdb --sql="SELECT COUNT(*) FROM nodes_method"  # Query
./gocpg serve --port 50051 --data-dir ./data/projects  # gRPC server

How does GoCPG integrate with Python?

The Python client (src/services/gocpg/) supports two transports: gRPC (grpc_transport.py) and subprocess. The transport configuration ("auto", "grpc", "subprocess") determines the mode. In "auto" mode, gRPC is tried first, then subprocess.

What tables does GoCPG create?

GoCPG creates 43 node tables, 25 edge tables, 2 views, and several state/domain tables. Key tables: nodes_method (methods/functions), nodes_call (call sites), nodes_type_decl (type declarations), nodes_file (source files), edges_call (call edges), edges_ast (AST structure), edges_cfg (control flow), edges_reaching_def (data flow reaching definitions), edges_inherits_from (inheritance). Views: call_containment (denormalized caller/callee) and method_docstrings. State: cpg_git_state, cpg_file_state, cpg_fqn_index. Full schema in docs/reference/en/SCHEMA.md.

What is the GoCPG pass pipeline?

GoCPG runs 31 analysis passes in a DAG (directed acyclic graph) order: base passes (MetaData, File, Namespace) → type passes (TypeNode, TypeDecl, Inheritance) → CFG/Dominator/CDG → call resolution (CallGraph, CallResolution) → data flow (Ref, AliasAnalysis, ReachingDef, InterproceduralReachingDef) → PDG/DDG → enrichment (MethodMetrics, FindingGeneration, PatternMatch). Each pass reads CPGGraph (in-memory), writes DiffGraph, never touches DuckDB directly.

What is incremental update in GoCPG?

GoCPG supports git-based incremental updates via gocpg update or gocpg ci-update. It compares file hashes in cpg_file_state against the working tree, re-parses only changed files, and rebuilds affected analysis passes. For CI, --base-ref limits analysis to files changed since a specific git reference. Falls back to full re-parse when changes exceed --incremental-threshold (default: 30 files).

What is TKB (Type Knowledge Base)?

TKB is a type knowledge base in YAML format (gocpg/configs/typestubs/) containing 46 files for 11 languages. Provides return type information for standard library functions without analyzing source code.


Hybrid Search and Retrieval

How does hybrid search work?

HybridRetriever (src/retrieval/hybrid/retriever.py) performs parallel search across ChromaDB (vector) and DuckDB (graph), then merges results using adaptive RRF (Reciprocal Rank Fusion). Weights depend on query type: semantic (75% vector / 25% graph), structural (25% / 75%), security (40% / 60%), default (60% / 40%).

What is stored in ChromaDB?

ChromaDB contains several collections: code_comments (code documentation and comments), code_snippets (code fragments), and domain_patterns (domain patterns). Collections are isolated per project via name prefixes (e.g., codegraph_code_comments). Indexing is performed by scripts/index_codegraph_vectors.py.

What is the Pre-Retrieval phase (Phase E)?

The pre_retrieval node in the LangGraph graph (src/workflow/orchestration/pre_retrieval.py) runs HybridRetriever between intent classification and routing. Results are stored in state["pre_retrieval_results"] and available to all scenario handlers. Enabled by default (config.yamlworkflows.pre_retrieval.enable: true).

How does RRF merging work?

RRF (Reciprocal Rank Fusion) is a method for merging ranked lists. For each document: score = Σ (weight / (k + rank_i)), where k is a smoothing constant, rank_i is the position in the i-th list, and weight is the source weight. Results are sorted by final score. Implementation in src/retrieval/hybrid/merger.py.


Configuration and Setup

What is get_unified_config?

The get_unified_config() function (src/config/unified_config.py) is the single access point for configuration. Returns a Pydantic-based UnifiedConfig object loaded from config.yaml on first call (singleton via get_instance()). Values accessed via attributes: config.llm.provider, config.timeouts.global_query_timeout, config.reranking.boost_domain_match.

What sections does config.yaml have?

Main sections: llm (LLM provider and settings), api (port, auth, demo mode), workflows (scenario settings, pre-retrieval, enrichment), security (rate limit, DLP, CSP), composition (composite orchestrators), timeouts (service timeouts), projects (project registry), domain (active domain).

How to configure the LLM provider?

In config.yamlllm.provider, set yandex, gigachat, openai, or local. Environment variables: YANDEX_API_KEY/YANDEX_FOLDER_ID for Yandex AI, GIGACHAT_AUTH_KEY for GigaChat, OPENAI_API_KEY for OpenAI. For local Qwen3: QWEN3_MODEL_PATH.

How to switch the active project?

Projects are registered in config.yamlprojects.registry or in projects.yaml. The active project is set via projects.active. When switching, the project’s domain plugin is automatically activated via DomainRegistry.activate().


Project Import

How to import a project?

python -m src.cli import /path/to/source --language python

Or programmatically:

from src.project_import import import_project
result = await import_project(repo_url="https://...", language="python")

What steps does import include?

Full pipeline (src/project_import/pipeline.py): CloneStep → DetectLanguageStep → GoCPGParseStep → ValidateStep → ChromaDBSyncStep → DocGenerationStep → VectorIndexStep → DomainSetupStep. Incremental (when DB exists): GoCPGUpdateStep → ValidateStep → ChromaDBSyncStep → DomainSetupStep. Automatically detected by the presence of cpg_git_state rows. DocGenerationStep auto-generates documentation from CPG when a docs/ directory is missing.

How to update ChromaDB indexes?

python scripts/index_codegraph_vectors.py

The script indexes code (snippets) and documentation (Q&A pairs) into ChromaDB collections. When documentation is updated, re-indexing is recommended to keep enrichment context current.


Answer Enrichment

How does enrichment work?

Enrichment (src/workflow/scenarios/onboarding/enrichment.py) runs in three phases: 1. Phase 1 — extract descriptions from CPG comments (docstrings and annotations) 2. Phase 2 — retrieve context from ChromaDB: Q&A pairs (code_comments), code examples (code_snippets) 3. Phase 3 — LLM synthesis: a prompt combines the original answer, CPG descriptions, vector context, and generates an enriched answer

Can LLM enrichment be disabled?

Yes, via config.yamlworkflows.onboarding.enrichment.enable: false. Handlers can also set skip_enrichment=True on OnboardingResult for structured responses (tables, lists) to prevent LLM from rewriting them.

What LLM parameters are used for enrichment?

llm_max_tokens and llm_temperature are set in config.yamlworkflows.onboarding.enrichment. Tokens scale adaptively: max(base_max_tokens, len(original_answer) // 3 + 200) to prevent truncation of long answers. Defaults: llm_max_tokens: 1000, llm_temperature: 0.3.

What is _has_rich_structure?

The _has_rich_structure() function determines whether an answer contains structured formatting (headings ##, call chains, caller/callee lists). For such answers, Phase 1 (CPG comments) skips reformatting via DefinitionFormatter, preserving the original structure.


Scenario Workflows

How is a workflow handler structured?

Each scenario is implemented as a function {name}_workflow(state: MultiScenarioState) -> MultiScenarioState. The handler receives the state with the query, pre-retrieval results, and metadata, performs analysis via CPG and/or LLM, and populates answer, evidence, and metadata fields in the state.

What are composite workflows?

Composite workflows orchestrate multiple sub-scenarios: S18 (code optimization) runs S02, S05, S06, S11, S12 in parallel (60s timeout), S19 (standards check) runs S08, S17, S18 sequentially (45s), Audit runs 9 sub-scenarios in parallel (600s). Conflicts are resolved via priority with security boost (1.5x) and compliance boost (1.3x).

What base class should a handler use?

All scenario handlers inherit from BaseHandler (src/workflow/scenarios/_base/handler.py). Do not use AnalysisHandler (src/workflow/handlers/analysis.py) — its constructor signature is incompatible with the scenario registry.

How does onboarding processing (S01) work?

onboarding_workflow (src/workflow/scenarios/onboarding/workflow.py) uses the OnboardingHandlerRegistry. The detect_onboarding_query_type() function determines the query type (definition, call graph, mechanism, subsystem, etc.) and selects the handler. Supported types: definition, call_graph, mechanism_explain, subsystem, file_structure, key_functions, metrics.


Multi-Tenancy and API

How does multi-tenancy work?

When multi_tenant.enabled: true, the API is scoped by the X-Project-Id header. ProjectContext is a frozen dataclass with fields project_id, group_id, db_path, domain, language, collection_prefix. ProjectScopedServices caches CPGQueryService and VectorStore instances by db_path/collection_prefix.

How is the database path protected?

All API routers use ctx.db_path from ProjectContext. User-supplied db_path from request body or query params is ignored with a deprecation warning. CPGQueryBase instances accept allowed_db_paths — checked on every query.

What access interfaces are available?

Four access layers: CLI (python -m src.cli), REST API (/api/v1/), MCP server (python -m src.mcp), ACP protocol (src/acp/). REST API is documented at http://localhost:8000/api/docs.

How does demo mode work?

When api.demo.enabled: true, the /api/v1/demo/chat endpoint is accessible without authentication. It accepts {"query": "..."}, runs the full pipeline (classification → processing → enrichment), and returns {"answer": "...", "scenario_id": "...", "processing_time_ms": ...}.


Analysis Modules

What analysis modules are available?

CodeGraph includes several analysis modules in src/analysis/: - Clone Detection (clone_detector.py) — finds duplicate code across 4 clone types - Taint Analysis (dataflow/taint_analysis.py) — tracks untrusted data propagation - Symbolic Execution (dataflow/symbolic_execution.py) — path feasibility checking via Z3 - Race Condition Detection (race_detector.py) — finds TOCTOU and synchronization issues - Concurrency Analysis (concurrency_analyzer.py) — detects deadlocks and lock ordering violations - Compliance Mapping (compliance.py) — maps findings to CWE, OWASP, CERT, MISRA standards - Call Graph Analysis (callgraph/) — PageRank, SCC, betweenness centrality, path finding - Autofix Engine (autofix/) — template and LLM-assisted vulnerability fix generation

How does clone detection work?

ASTCloneDetector (src/analysis/clone_detector.py) compares methods pairwise using 4 similarity metrics in order: token Jaccard (>0.95 → Type-1 exact), normalized tokens (>0.85 → Type-2 renamed), AST structure (>0.75 → Type-3 structural), control flow (>0.70 → Type-4 semantic). Returns CloneResult objects sorted by similarity. Default limit: 300 methods (max_methods_extended). See docs/reference/en/CLONE_DETECTION.md.

How does taint analysis work?

TaintPropagator (src/analysis/dataflow/taint_analysis.py) traces untrusted data flow through the CPG using edges_reaching_def and edges_call. Sources and sinks are defined per domain plugin (get_taint_sources(), get_taint_sinks()). Uses field-sensitive and inter-procedural analysis. Integrates with PathConstraintTracker for symbolic path feasibility checking via Z3. See docs/reference/en/SYMBOLIC_EXECUTION.md.

How does the autofix engine work?

The autofix engine (src/analysis/autofix/) generates code fix suggestions for detected vulnerabilities. It reads the actual source file, maps the vulnerability to a CWE ID, selects a fix template or generates an LLM-assisted patch, validates the diff, and outputs a unified diff. Supports both template-based (deterministic) and LLM-based (creative) fix generation.

How are dead methods identified?

Dead methods are identified via AuditRunner._collect_metrics() with multi-layered false positive filtering: is_test filter (V25), class-aware reachability (V26), inheritance (V26b), nested functions (V27), fully dead modules (V28), low-vitality modules (V29), companion files (V30). GoCPG creates synthetic CALL edges for __init__.py re-exports, callable keyword arguments (callback=fn), and framework registration patterns (LangGraph add_node, FastAPI include_router).


Security Hypothesis System

What is the security hypothesis system?

The hypothesis system (src/security/hypothesis/) is a multi-criteria vulnerability detection engine. It combines a knowledge base (58 CWE entries, 27 CAPEC entries, 13 detection templates) with CPG analysis. It generates hypotheses about potential vulnerabilities, synthesizes DuckDB/PGQ queries to verify them, and scores results using a multi-criteria scorer.

How does hypothesis generation work?

HypothesisExecutor (src/security/hypothesis/executor.py) takes a security concern (e.g., “SQL injection in user input handling”) and: (1) matches it against CWE/CAPEC knowledge bases, (2) selects applicable detection templates, (3) synthesizes CPG queries targeting the vulnerability pattern, (4) executes queries against the project’s DuckDB, (5) scores findings using the multi-criteria scorer, (6) returns ranked hypotheses with evidence.

What are hypothesis providers?

Providers supply domain-specific vulnerability patterns. Built-in providers exist for PostgreSQL, Django, Express, Spring, Gin, and Next.js frameworks (src/security/hypothesis/{framework}/). A YAML-based provider (providers/yaml_provider.py) allows custom pattern definitions. Providers are auto-discovered based on the active domain plugin.

How does the multi-criteria scorer work?

MultiCriteriaScorer (src/security/hypothesis/multi_criteria_scorer.py) evaluates hypotheses across multiple dimensions: data flow reachability, code context, framework-specific patterns, and historical feedback. Three presets are available: embedded (strict, for safety-critical systems), web (balanced, for web applications), and enterprise (compliance-oriented). Each preset adjusts the weight distribution across scoring criteria.

How does the feedback loop work?

FeedbackStore (src/security/hypothesis/feedback.py) tracks analyst decisions (confirmed/false-positive/deferred) on hypotheses. TrendStore (src/security/hypothesis/trend_store.py) aggregates feedback over time to identify recurring patterns. This feedback reduces false positives in subsequent runs and helps calibrate the multi-criteria scorer.


Documentation and Changelog Generation

How does documentation generation work?

DocumentationGenerator (src/services/doc_generator.py) auto-generates 8 documentation sections from CPG data: project overview (MVD), module overview, function documentation, pipeline documentation, business logic description, type documentation, coverage reports, and call graph diagrams. Output can be indexed into ChromaDB for vector search.

How to generate documentation?

python -m src.cli.generate_docs full --output ./docs/generated --language en
python -m src.cli.generate_docs full --output ./docs/generated --language ru  # Russian
python -m src.cli.generate_docs --sections overview,functions --output ./docs/generated

Or via MCP: codegraph_reindex action=generate. During project import, DocGenerationStep runs automatically when the project has no docs/ directory.

How does changelog generation work?

ChangelogGenerator (src/changelog/generator.py) parses git history for conventional commits (feat:, fix:, docs:, etc.), extracts scope and breaking change markers, groups changes by type, resolves issue references, and generates structured Markdown changelogs. Supports LLM enhancement for more descriptive entries and bilingual output (EN/RU).

How to generate a changelog?

python -m src.cli.changelog_commands --from v1.0.0 --to HEAD
python -m src.cli.changelog_commands --from v1.0.0 --to HEAD --language ru --enhance

Or via REST API: POST /api/v1/changelog/generate with {"from_ref": "v1.0.0", "to_ref": "HEAD"}.

How does vector reindexing work?

VectorIndexer (src/retrieval/vector_indexer.py) indexes project data into 6 ChromaDB collections: code_snippets, qa_pairs, documentation, sql_examples, code_comments, and domain_patterns. Supports incremental indexing via file state tracking. CLI: python -m src.cli.reindex_commands reindex [--generate] [--sections ...]. MCP: codegraph_reindex action=reindex.


Domain Plugins

What is a domain plugin?

A domain plugin (DomainPluginV3) provides domain-specific knowledge for a programming language or framework. Each plugin defines: module/subsystem names, taint sources and sinks, operation categories, security patterns, code quality rules, and domain-specific prompts. Configured via 10 YAML files per domain in src/domains/{name}/config/.

What domains are available?

13 domains: python_generic (default), python_django, java, csharp, go, javascript, kotlin, php, postgresql, cpp, bsl_1c, typescript, web. The active domain is set in config.yamldomain.name or per-project in the project registry.

How does domain auto-detection work?

CPGConfigLoader (src/config/cpg_config.py) uses file extension heuristics from the CPG to detect the project’s primary language. When a project is switched via ProjectManager.switch_project(), the corresponding domain plugin is automatically activated via DomainRegistry.activate().

How to create a custom domain plugin?

Create a directory src/domains/{name}/ with a plugin class inheriting DomainPluginV3 and 10 YAML config files: modules.yaml, subsystems.yaml, taint_sources.yaml, taint_sinks.yaml, operation_categories.yaml, security_patterns.yaml, quality_rules.yaml, prompts.yaml, domain_config.yaml, compliance_mapping.yaml. Register the plugin in the domain registry.


Pattern Engine

What is the pattern engine?

The GoCPG pattern engine (gocpg/pkg/patterns/) provides structural code search with semantic constraints. It operates in 4 phases: (1) Structural — tree-sitter CST matching with metavariables ($VAR, $$$VARS, $_), (2) CPG constraints — data-flow, call context, type, and metric filters, (3) Incremental — dependency tracking for efficient re-analysis, (4) Rewrite — template-based autofix with metavariable substitution.

How to search for patterns?

# Structural search (no CPG needed)
gocpg search --pattern "malloc($SIZE)" --lang c --input /path/to/source

# CPG-aware scan with rules
gocpg scan --db=out.duckdb --rules=configs/rules/common/
gocpg scan --db=out.duckdb --domain=postgresql --format sarif

# With autofix
gocpg scan --db=out.duckdb --rules=configs/rules/ --fix --input=./src --dry-run

How are pattern rules defined?

Rules are YAML files in configs/rules/ (190 rules across 14 directories):

id: rule-id
message: "Description with $VAR"
severity: error|warning|info|hint
languages: [c, cpp]
rule:
  pattern: "malloc($SIZE)"
fix: "safe_malloc($SIZE)"          # Optional rewrite template
constraints: { SIZE: { regex: "^[a-z]" } }
cpg:
  dataflow: { from: "$SRC", to: "$SINK" }
  metrics: { cyclomatic_complexity: ">10" }

Can LLM generate pattern rules?

Yes. LLMPatternGenerator (src/analysis/patterns/llm_pattern_generator.py) converts natural language descriptions to YAML pattern rules. It auto-validates generated rules via gocpg validate-rule and retries on failure. System prompts are loaded from bilingual YAML.


MCP Server and Tools

What is the MCP server?

The MCP (Model Context Protocol) server (src/mcp/) exposes CodeGraph functionality to AI assistants. It supports 21 built-in tools plus dynamic SQL tools loaded from .codegraph/tools.yaml. Transports: stdio (default), SSE, and HTTP.

How to start the MCP server?

python -m src.mcp                              # stdio (default)
python -m src.mcp --transport sse --port 27495  # SSE
python -m src.mcp --transport http --port 27495 # HTTP
python -m src.mcp --no-auth                     # Disable auth (dev only)

What MCP tools are available?

Key tools: codegraph_query (natural language query), codegraph_search (code search), codegraph_callgraph (call graph analysis), codegraph_hotspots (complexity hotspots), codegraph_security (security analysis), codegraph_patterns (pattern scan), codegraph_compliance (compliance check), codegraph_tech_debt (tech debt analysis), codegraph_explain (code explanation), codegraph_standards_check (coding standards), codegraph_reindex (vector reindex), codegraph_hypothesis (security hypothesis), codegraph_docs_sync (documentation drift detection).

What are dynamic SQL tools?

Dynamic SQL tools are defined in .codegraph/tools.yaml per project. Each tool specifies a name, description, SQL query template, and parameters. The MCP server loads these at startup and exposes them alongside built-in tools. This allows project-specific CPG queries without code changes.

What is the interface documentation sync tool?

codegraph_docs_sync runs the 5-phase interface documentation sync composite (src/workflow/scenarios/interface_docs_sync_composite.py): Discovery → DocParse → Generation → DriftDetect → Report. It validates 6 interfaces (REST API, CLI, MCP, ACP, gRPC, WebSocket) against their documentation and reports drift types: UNDOCUMENTED, STALE, OUTDATED, COVERED.