Frequently Asked Questions (FAQ)

Detailed answers about CodeGraph and GoCPG architecture and subsystems.

Table of Contents¶

Orchestration and Routing
Authorization and Security
Code Property Graph (CPG)
GoCPG — CPG Generator
Hybrid Search and Retrieval
Configuration and Setup
Project Import
Answer Enrichment
Scenario Workflows
Multi-Tenancy and API
Analysis Modules
Security Hypothesis System
Documentation and Changelog Generation
Domain Plugins
Pattern Engine
MCP Server and Tools

Orchestration and Routing¶

How does scenario orchestration work?¶

Orchestration is implemented via a LangGraph graph with three nodes: classify_intent_node → pre_retrieval → route_by_intent. Runtime execution enters through invoke_role_bound_scenario, which first validates the accountable digital employee, project, namespace, and task scope. After that preflight, intent routing selects one of 21 scenarios for processing.

What scenarios does the system support?¶

The system supports 21 scenarios: S01 (onboarding), S02 (security), S03 (documentation), S04 (feature development), S05 (refactoring), S06 (performance), S07 (test coverage), S08 (compliance), S09 (code review), S10 (cross-repo analysis), S11 (architecture), S12 (tech debt), S13 (mass refactoring), S14 (security incident), S15 (debugging), S16 (entry points), S17 (file editing), S18 (code optimization — composite), S19 (standards check — composite), S20 (dependencies), S21 (structural pattern search).

What is `route_by_intent` and how does it work?¶

The route_by_intent function (src/workflow/orchestration/router.py:16) takes a MultiScenarioState with an intent field (populated during classification) and returns the name of the next workflow node. Internally, it uses a routing_map dictionary that maps intent values to handler function names (e.g., "onboarding" → "onboarding_workflow", "tech_debt" → "tech_debt_workflow"). It supports composite mode for complex queries.

What is the role-bound scenario invoker?¶

invoke_role_bound_scenario (src/digital_employees/runtime/scenarios/role_bound_scenario_invoker.py) is the runtime scenario boundary. It validates the employee/scenario policy and required scope, runs pre-retrieval and routing, then returns a state dict with fields such as query, intent, scenario_id, confidence, answer, evidence, and metadata.

What is `classify_intent_node`?¶

The classify_intent_node (src/workflow/orchestration/intent.py) performs LLM-based classification of the user’s intent. It takes the query text as input and populates intent, scenario_id, and confidence fields in the MultiScenarioState. It uses a prompt from config/prompts/{lang}/workflow_prompts.yaml listing all scenarios.

What is the `pre_retrieval` node?¶

The pre_retrieval node (src/workflow/orchestration/pre_retrieval.py) is an optional step between classification and routing (Phase E). It runs HybridRetriever.retrieve() to pre-fetch context from OpenViking and DuckDB. Results are stored in state["pre_retrieval_results"]. Enabled via config.yaml → workflows.pre_retrieval.enable: true (on by default).

Authorization and Security¶

How is API authorization implemented?¶

Authorization is implemented through several mechanisms: JWT tokens, API keys, and OAuth 2.0. The central FastAPI dependency — get_auth_context (src/api/auth/middleware.py) — extracts and validates user data from each incoming request. The result is placed in an AuthContext container with fields user_id, username, role, scopes, and auth_method.

How does permission checking work?¶

The has_permission function (src/api/auth/middleware.py) checks whether a user has a specific permission based on their role (VIEWER, ANALYST, REVIEWER, ADMIN). Roles and permissions are defined in src/api/database/models.py. The system uses RBAC (Role-Based Access Control) with extensibility via scopes.

What OAuth providers are supported?¶

Two OAuth providers are supported: SourceCraft (via Yandex ID, SourceCraftOAuth in src/api/auth/oauth.py) and GitVerse (via Sber ID, GitVerseOAuth). Both inherit a shared interface and support get_authorization_url(), exchange_code(), and get_user_info().

How does MCP server authentication work?¶

The MCP server (src/mcp/auth.py) supports Bearer JWT and API keys for SSE/HTTP transports. Authentication is not required for stdio transport. The --no-auth flag disables checks for development.

What attack protection mechanisms are implemented?¶

The system includes: path neutralization (db_path from request body is ignored), IDOR protection (owner_user_id verification), webhook replay protection (timestamp checks), OAuth CSRF protection (one-time state tokens), rate limiting (three-tier rate limiter), SQL query complexity validation, and DLP filtering.

How does token blacklisting work?¶

The is_token_blacklisted function checks whether a JWT token has been revoked. The blacklist is synchronized via _blacklist_sync_task() with a configurable interval (security.token_blacklist_sync_interval_seconds). Revoked tokens are blocked until expiry.

Code Property Graph (CPG)¶

What is a Code Property Graph (CPG)?¶

A CPG (Code Property Graph) is a unified representation of source code as a graph containing the Abstract Syntax Tree (AST), Control Flow Graph (CFG), and Data Flow Graph (DFG). In CodeGraph, the CPG is stored in DuckDB with tables nodes_method, nodes_type, edges_call, call_containment, edges_inherits_from, and others.

How is `CPGQueryService` structured?¶

CPGQueryService (src/services/cpg/) is the internal client for typed CPG read helpers. It uses a mixin pattern: base class CPGQueryBase (src/services/cpg/base.py) plus query mixins for subsystems, call graphs, security, performance, quality, semantic data, statistics, comments, external dependencies, types and patterns. Runtime callers should use explicit project-scoped context; user-facing integrations should use REST, MCP or GoCPG gRPC surfaces rather than direct DuckDB access.

What is `call_containment`?¶

The call_containment view in DuckDB links function calls to their containing methods. Fields: containing_method_name (calling method) and callee_name (called method). Used for building call graphs in MechanismExplanationHandler.

What does the `is_test` field in `nodes_method` mean?¶

The boolean is_test field indicates whether a method is a test function (determined by GoCPG based on naming and location). Used to filter out test functions from analysis results for non-test queries.

GoCPG — CPG Generator¶

What is GoCPG?¶

GoCPG (gocpg/) is a Code Property Graph generator written in Go. It supports 11 programming languages: C, C++, Go, Python, JavaScript, TypeScript, Java, Kotlin, C#, PHP, and 1C:Enterprise. Uses tree-sitter for parsing, requires CGO_ENABLED=1.

How to run GoCPG?¶

cd gocpg
CGO_ENABLED=1 go build -o gocpg.exe ./cmd/gocpg  # Build
./gocpg parse --input=/path/to/source --output=out.duckdb --lang=python  # Parse
./gocpg stats --db=out.duckdb  # Stats
./gocpg serve --port 50051 --data-dir ./data/projects  # gRPC server

How does GoCPG integrate with Python?¶

The Python client (src/services/gocpg/) supports two transports: gRPC (grpc_transport.py) and subprocess. The transport configuration ("auto", "grpc", "subprocess") determines the mode. In "auto" mode, gRPC is tried first, then subprocess.

What tables does GoCPG create?¶

GoCPG creates 43 node tables, 25 edge tables, 2 views, and several state/domain tables. Key tables: nodes_method (methods/functions), nodes_call (call sites), nodes_type_decl (type declarations), nodes_file (source files), edges_call (call edges), edges_ast (AST structure), edges_cfg (control flow), edges_reaching_def (data flow reaching definitions), edges_inherits_from (inheritance). Views: call_containment (denormalized caller/callee) and method_docstrings. State: cpg_git_state, cpg_file_state, cpg_fqn_index. Full schema in docs/reference/en/SCHEMA.md.

What is the GoCPG pass pipeline?¶

GoCPG runs 31 analysis passes in a DAG (directed acyclic graph) order: base passes (MetaData, File, Namespace) → type passes (TypeNode, TypeDecl, Inheritance) → CFG/Dominator/CDG → call resolution (CallGraph, CallResolution) → data flow (Ref, AliasAnalysis, ReachingDef, InterproceduralReachingDef) → PDG/DDG → enrichment (MethodMetrics, FindingGeneration, PatternMatch). Each pass reads CPGGraph (in-memory), writes DiffGraph, never touches DuckDB directly.

What is incremental update in GoCPG?¶

GoCPG supports git-based incremental updates via gocpg update or gocpg ci-update. It compares file hashes in cpg_file_state against the working tree, re-parses only changed files, and rebuilds affected analysis passes. For CI, --base-ref limits analysis to files changed since a specific git reference. Falls back to full re-parse when changes exceed --incremental-threshold (default: 30 files).

What is TKB (Type Knowledge Base)?¶

TKB is a type knowledge base in YAML format (gocpg/configs/typestubs/) containing 46 files for 11 languages. Provides return type information for standard library functions without analyzing source code.

Hybrid Search and Retrieval¶

How does hybrid search work?¶

HybridRetriever (src/retrieval/hybrid/retriever.py) performs parallel search across OpenViking semantic retrieval and GoCPG gRPC graph reads, then merges results using adaptive RRF (Reciprocal Rank Fusion). Weights depend on query type: semantic (75% semantic / 25% graph), structural (25% / 75%), security (40% / 60%), default (60% / 40%). Runtime graph reads do not open the project DuckDB file directly.

What is stored in OpenViking?¶

OpenViking stores project-scoped resource and memory records for code comments, code snippets, generated documentation, domain patterns and accepted project memory. Sync is performed by project import/update and documentation-memory workflows through the OpenViking client/adapter layer. The retired local vector-indexer and user-triggered REST, CLI or MCP reindex surfaces are not active entry points.

What is the Pre-Retrieval phase (Phase E)?¶

The pre_retrieval node in the LangGraph graph (src/workflow/orchestration/pre_retrieval.py) runs HybridRetriever between intent classification and routing. Results are stored in state["pre_retrieval_results"] and available to all scenario handlers. Enabled by default (config.yaml → workflows.pre_retrieval.enable: true).

How does RRF merging work?¶

RRF (Reciprocal Rank Fusion) is a method for merging ranked lists. For each document: score = Σ (weight / (k + rank_i)), where k is a smoothing constant, rank_i is the position in the i-th list, and weight is the source weight. Results are sorted by final score. Implementation in src/retrieval/hybrid/merger.py.

Configuration and Setup¶

What is `get_unified_config`?¶

The get_unified_config() function (src/config/unified_config.py) is the single access point for configuration. Returns a Pydantic-based UnifiedConfig object loaded from config.yaml on first call (singleton via get_instance()). Values accessed via attributes: config.llm.provider, config.timeouts.global_query_timeout, config.reranking.boost_domain_match.

What sections does config.yaml have?¶

Main sections: llm (LLM provider and settings), api (port, auth, demo mode), workflows (scenario settings, pre-retrieval, enrichment), security (rate limit, DLP, CSP), composition (composite orchestrators), timeouts (service timeouts), projects (project registry), domain (active domain).

How to configure the LLM provider?¶

In config.yaml → llm.provider, set yandex, gigachat, openai, or local. Environment variables: YANDEX_API_KEY/YANDEX_FOLDER_ID for Yandex AI, GIGACHAT_AUTH_KEY for GigaChat, OPENAI_API_KEY for OpenAI. For local Qwen3: QWEN3_MODEL_PATH.

How to select project scope?¶

Projects are registered in config.yaml → projects.registry or in projects.yaml. Runtime operations should receive explicit project scope from the route, CLI argument, workspace/session context or actor scope. After resolving that scope, the matching domain plugin is activated through DomainRegistry.activate(). Do not rely on a process-global active project for runtime reads.

Project Import¶

How to import a project?¶

python -m src.cli import /path/to/source --language python

Or programmatically:

from src.project_import import import_project
result = await import_project(repo_url="https://...", language="python")

What steps does import include?¶

Full pipeline (src/project_import/pipeline.py): CloneStep → DetectLanguageStep → GoCPGParseStep → ValidateStep → OpenVikingSyncStep → DocGenerationStep → DomainSetupStep. Incremental (when DB exists): GoCPGUpdateStep → ValidateStep → OpenVikingSyncStep → DomainSetupStep. Automatically detected by the presence of cpg_git_state rows. DocGenerationStep auto-generates documentation from CPG when a docs/ directory is missing.

How to update OpenViking indexes?¶

User-triggered OpenViking reindex surfaces were retired. Project import, update and documentation-memory workflows refresh OpenViking through the OpenViking sync/client layer. Use those workflows instead of REST, CLI, MCP or local vector-indexer reindex commands.

Answer Enrichment¶

How does enrichment work?¶

Enrichment (src/workflow/scenarios/onboarding/enrichment.py) runs in three phases: 1. Phase 1 — extract descriptions from CPG comments (docstrings and annotations) 2. Phase 2 — retrieve context from OpenViking: Q&A pairs (code_comments), code examples (code_snippets) 3. Phase 3 — LLM synthesis: a prompt combines the original answer, CPG descriptions, vector context, and generates an enriched answer

Can LLM enrichment be disabled?¶

Yes, via config.yaml → workflows.onboarding.enrichment.enable: false. Handlers can also set skip_enrichment=True on OnboardingResult for structured responses (tables, lists) to prevent LLM from rewriting them.

What LLM parameters are used for enrichment?¶

llm_max_tokens and llm_temperature are set in config.yaml → workflows.onboarding.enrichment. Tokens scale adaptively: max(base_max_tokens, len(original_answer) // 3 + 200) to prevent truncation of long answers. Defaults: llm_max_tokens: 1000, llm_temperature: 0.3.

What is `_has_rich_structure`?¶

The _has_rich_structure() function determines whether an answer contains structured formatting (headings ##, call chains, caller/callee lists). For such answers, Phase 1 (CPG comments) skips reformatting via DefinitionFormatter, preserving the original structure.

Scenario Workflows¶

How is a workflow handler structured?¶

Each scenario is implemented as a function {name}_workflow(state: MultiScenarioState) -> MultiScenarioState. The handler receives the state with the query, pre-retrieval results, and metadata, performs analysis via CPG and/or LLM, and populates answer, evidence, and metadata fields in the state.

What are composite workflows?¶

Composite workflows orchestrate multiple sub-scenarios: S18 (code optimization) runs S02, S05, S06, S11, S12 in parallel (60s timeout), S19 (standards check) runs S08, S17, S18 sequentially (45s), Audit runs 9 sub-scenarios in parallel (600s). Conflicts are resolved via priority with security boost (1.5x) and compliance boost (1.3x).

What base class should a handler use?¶

All scenario handlers inherit from BaseHandler (src/workflow/scenarios/_base/handler.py). Do not use AnalysisHandler (src/workflow/handlers/analysis.py) — its constructor signature is incompatible with the scenario registry.

How does onboarding processing (S01) work?¶

onboarding_workflow (src/workflow/scenarios/onboarding/workflow.py) uses the OnboardingHandlerRegistry. The detect_onboarding_query_type() function determines the query type (definition, call graph, mechanism, subsystem, etc.) and selects the handler. Supported types: definition, call_graph, mechanism_explain, subsystem, file_structure, key_functions, metrics.

Multi-Tenancy and API¶

How does multi-tenancy work?¶

When multi_tenant.enabled: true, the API is scoped by the X-Project-Id header. ProjectContext is a frozen dataclass with fields project_id, group_id, db_path, domain, language, collection_prefix. ProjectScopedServices caches CPGQueryService instances by db_path and OpenViking retrieval adapters by collection_prefix.

How is the database path protected?¶

All API routers use ctx.db_path from ProjectContext. User-supplied db_path from request body or query params is ignored with a deprecation warning. CPGQueryBase instances accept allowed_db_paths — checked on every query.

What access interfaces are available?¶

Four access layers: CLI (python -m src.cli), REST API (/api/v1/), MCP server (python -m src.mcp), ACP protocol (src/acp/). REST API is documented at http://localhost:8000/api/docs.

How does demo mode work?¶

The legacy /api/v1/demo/chat endpoint has been retired. Use authenticated /api/v1/chat for conversational workflows and /api/v1/health for unauthenticated smoke checks.

Analysis Modules¶

What analysis modules are available?¶

CodeGraph includes several analysis modules in src/analysis/: - Clone Detection (clone_detector.py) — finds duplicate code across 4 clone types - Taint Analysis (dataflow/taint_analysis.py) — tracks untrusted data propagation - Symbolic Execution (dataflow/symbolic_execution.py) — path feasibility checking via Z3 - Race Condition Detection (race_detector.py) — finds TOCTOU and synchronization issues - Concurrency Analysis (concurrency_analyzer.py) — detects deadlocks and lock ordering violations - Compliance Mapping (compliance.py) — maps findings to CWE, OWASP, CERT, MISRA standards - Call Graph Analysis (callgraph/) — PageRank, SCC, betweenness centrality, path finding - Autofix Engine (autofix/) — template and LLM-assisted vulnerability fix generation

How does clone detection work?¶

ASTCloneDetector (src/analysis/graph_core/clone_detector.py) compares methods pairwise using 4 similarity metrics in order: token Jaccard (>0.95 → Type-1 exact), normalized tokens (>0.85 → Type-2 renamed), AST structure (>0.75 → Type-3 structural), control flow (>0.70 → Type-4 semantic). Returns CloneResult objects sorted by similarity. Default limit: 300 methods (max_methods_extended). See docs/reference/en/CLONE_DETECTION.md.

How does taint analysis work?¶

TaintPropagator (src/analysis/dataflow/taint_analysis.py) traces untrusted data flow through the CPG using edges_reaching_def and edges_call. Sources and sinks are defined per domain plugin (get_taint_sources(), get_taint_sinks()). Uses field-sensitive and inter-procedural analysis. Integrates with PathConstraintTracker for symbolic path feasibility checking via Z3. See docs/reference/en/SYMBOLIC_EXECUTION.md.

How does the autofix engine work?¶

The autofix engine (src/analysis/autofix/) generates code fix suggestions for detected vulnerabilities. It reads the actual source file, maps the vulnerability to a CWE ID, selects a fix template or generates an LLM-assisted patch, validates the diff, and outputs a unified diff. Supports both template-based (deterministic) and LLM-based (creative) fix generation.

How are dead methods identified?¶

Dead methods are identified via AuditRunner._collect_metrics() with multi-layered false positive filtering: is_test filter (V25), class-aware reachability (V26), inheritance (V26b), nested functions (V27), fully dead modules (V28), low-vitality modules (V29), companion files (V30). GoCPG creates synthetic CALL edges for __init__.py re-exports, callable keyword arguments (callback=fn), and framework registration patterns (LangGraph add_node, FastAPI include_router).

Security Hypothesis System¶

What is the security hypothesis system?¶

The hypothesis system (src/security/hypothesis/) is a multi-criteria vulnerability detection engine. It combines a knowledge base (58 CWE entries, 27 CAPEC entries, 13 detection templates) with CPG analysis. It generates hypotheses about potential vulnerabilities, synthesizes DuckDB/PGQ queries to verify them, and scores results using a multi-criteria scorer.

How does hypothesis generation work?¶

HypothesisExecutor (src/security/hypothesis/executor.py) takes a security concern (e.g., “SQL injection in user input handling”) and: (1) matches it against CWE/CAPEC knowledge bases, (2) selects applicable detection templates, (3) synthesizes CPG queries targeting the vulnerability pattern, (4) executes queries against the project’s DuckDB, (5) scores findings using the multi-criteria scorer, (6) returns ranked hypotheses with evidence.

What are hypothesis providers?¶

Providers supply domain-specific vulnerability patterns. Built-in providers exist for PostgreSQL, Django, Express, Spring, Gin, and Next.js frameworks (src/security/hypothesis/{framework}/). A YAML-based provider (providers/yaml_provider.py) allows custom pattern definitions. Providers are auto-discovered based on the active domain plugin.

How does the multi-criteria scorer work?¶

MultiCriteriaScorer (src/security/hypothesis/multi_criteria_scorer.py) evaluates hypotheses across multiple dimensions: data flow reachability, code context, framework-specific patterns, and historical feedback. Three presets are available: embedded (strict, for safety-critical systems), web (balanced, for web applications), and enterprise (compliance-oriented). Each preset adjusts the weight distribution across scoring criteria.

How does the feedback loop work?¶

FeedbackStore (src/security/hypothesis/feedback.py) tracks analyst decisions (confirmed/false-positive/deferred) on hypotheses. TrendStore (src/security/hypothesis/trend_store.py) aggregates feedback over time to identify recurring patterns. This feedback reduces false positives in subsequent runs and helps calibrate the multi-criteria scorer.

Documentation and Changelog Generation¶

How does documentation generation work?¶

DocumentationGenerator (src/services/doc_generator.py) auto-generates 8 documentation sections from CPG data: project overview (MVD), module overview, function documentation, pipeline documentation, business logic description, type documentation, coverage reports, and call graph diagrams. Output can be indexed into OpenViking for vector search.

How to generate documentation?¶

python -m src.cli.generate_docs full --output ./docs/generated --language en
python -m src.cli.generate_docs full --output ./docs/generated --language ru  # Russian
python -m src.cli.generate_docs --sections overview,functions --output ./docs/generated

During project import, DocGenerationStep runs automatically when the project has no docs/ directory.

How does changelog generation work?¶

ChangelogGenerator (src/changelog/generator.py) parses git history for conventional commits (feat:, fix:, docs:, etc.), extracts scope and breaking change markers, groups changes by type, resolves issue references, and generates structured Markdown changelogs. Supports LLM enhancement for more descriptive entries and bilingual output (EN/RU).

How to generate a changelog?¶

python -m src.cli.changelog_commands --from v1.0.0 --to HEAD
python -m src.cli.changelog_commands --from v1.0.0 --to HEAD --language ru --enhance

Or via REST API: POST /api/v1/changelog/generate with {"from_ref": "v1.0.0", "to_ref": "HEAD"}.

How does OpenViking sync work?¶

Project import/update and documentation-memory workflows publish project-scoped resources and accepted memory to OpenViking through the configured OpenViking client. The retired local vector-indexing modules and user-triggered REST, CLI or MCP reindex surfaces are not part of the active runtime contract.

Domain Plugins¶

What is a domain plugin?¶

A domain plugin (DomainPluginV3) provides domain-specific knowledge for a programming language or framework. Each plugin defines: module/subsystem names, taint sources and sinks, operation categories, security patterns, code quality rules, and domain-specific prompts. Configured via 10 YAML files per domain in src/domains/{name}/config/.

What domains are available?¶

13 domains: python_generic (default), python_django, java, csharp, go, javascript, kotlin, php, postgresql, cpp, bsl_1c, typescript, web. The active domain is set in config.yaml → domain.name or per-project in the project registry.

How does domain auto-detection work?¶

CPGConfigLoader (src/config/cpg_config.py) uses file extension heuristics from the CPG to detect the project’s primary language. Runtime code should resolve the project context first, then activate the matching domain plugin from that explicit project scope via DomainRegistry.activate(). Do not rely on a global current-project switch for domain routing.

How to create a custom domain plugin?¶

Create a directory src/domains/{name}/ with a plugin class inheriting DomainPluginV3 and 10 YAML config files: modules.yaml, subsystems.yaml, taint_sources.yaml, taint_sinks.yaml, operation_categories.yaml, security_patterns.yaml, quality_rules.yaml, prompts.yaml, domain_config.yaml, compliance_mapping.yaml. Register the plugin in the domain registry.

Pattern Engine¶

What is the pattern engine?¶

The GoCPG pattern engine (gocpg/pkg/patterns/) provides structural code search with semantic constraints. It operates in 4 phases: (1) Structural — tree-sitter CST matching with metavariables ($VAR, $$$VARS, $_), (2) CPG constraints — data-flow, call context, type, and metric filters, (3) Incremental — dependency tracking for efficient re-analysis, (4) Rewrite — template-based autofix with metavariable substitution.

How to search for patterns?¶

# Structural search (no CPG needed)
gocpg search --pattern "malloc($SIZE)" --lang c --input /path/to/source

# CPG-aware scan with rules
gocpg scan --db=out.duckdb --rules=configs/rules/common/
gocpg scan --db=out.duckdb --domain=postgresql --format sarif

# With autofix
gocpg scan --db=out.duckdb --rules=configs/rules/ --fix --input=./src --dry-run

How are pattern rules defined?¶

Rules are YAML files in configs/rules/ (190 rules across 14 directories):

id: rule-id
message: "Description with $VAR"
severity: error|warning|info|hint
languages: [c, cpp]
rule:
  pattern: "malloc($SIZE)"
fix: "safe_malloc($SIZE)"          # Optional rewrite template
constraints: { SIZE: { regex: "^[a-z]" } }
cpg:
  dataflow: { from: "$SRC", to: "$SINK" }
  metrics: { cyclomatic_complexity: ">10" }

Can LLM generate pattern rules?¶

Yes. LLMPatternGenerator (src/analysis/patterns/llm_pattern_generator.py) converts natural language descriptions to YAML pattern rules. It auto-validates generated rules via gocpg validate-rule and retries on failure. System prompts are loaded from bilingual YAML.

MCP Server and Tools¶

What is the MCP server?¶

The MCP (Model Context Protocol) server (src/mcp/) exposes CodeGraph functionality to AI assistants through built-in, reviewed tools. Project-local raw SQL tool loading is retired. Transports: stdio (default), SSE, and HTTP.

How to start the MCP server?¶

python -m src.mcp                              # stdio (default)
python -m src.mcp --transport sse --port 27495  # SSE
python -m src.mcp --transport http --port 27495 # HTTP
python -m src.mcp --no-auth                     # Disable auth (dev only)

What MCP tools are available?¶

Key tools: codegraph_context_retrieval_fused_search (context retrieval), codegraph_context_retrieval_file_read (file context), codegraph_security_taint_analysis_run (taint analysis), codegraph_quality_tech_debt_report_get (tech debt analysis), and codegraph_security_hypothesis_run (security hypothesis). Documentation drift detection runs through CLI python -m src.cli docs-sync or InterfaceDocsSyncRunner.

What is the interface documentation sync tool?¶

python -m src.cli docs-sync and InterfaceDocsSyncRunner run the 5-phase interface documentation sync composite (src/workflow/scenarios/interface_docs_sync_composite.py): Discovery → DocParse → Generation → DriftDetect → Report. It validates 6 interfaces (REST API, CLI, MCP, ACP, gRPC, WebSocket) against their documentation and reports drift types: UNDOCUMENTED, STALE, OUTDATED, COVERED.