Workflows Reference

Workflows Reference

Documentation for CodeGraph’s workflow system.

Table of Contents

Workflow Architecture

CodeGraph uses LangGraph for workflow orchestration. All queries flow through a single entry point — MultiScenarioCopilot — which classifies intent and routes to the appropriate scenario workflow.

                    ┌──────────────────┐
                    │   Entry Point    │
                    │   (User Query)   │
                    └────────┬─────────┘
                             │
                    ┌────────▼─────────┐
                    │  Intent Classify  │
                    │ (Keyword + LLM)  │
                    └────────┬─────────┘
                             │
                    ┌────────▼─────────┐
                    │  Scenario Router  │
                    │ (21 scenarios)   │
                    └────────┬─────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
     ┌────────▼───┐  ┌──────▼─────┐  ┌────▼────────┐
     │  Security  │  │ Onboarding │  │  Perf / ... │
     │  Workflow  │  │  Workflow  │  │  Workflows  │
     └────────┬───┘  └──────┬─────┘  └────┬────────┘
              │              │              │
              └──────────────┼──────────────┘
                             │
                    ┌────────▼─────────┐
                    │    Output        │
                    │  (Answer +       │
                    │   Evidence)      │
                    └──────────────────┘

MultiScenarioCopilot

The main entry point for all workflow execution.

Location: src/workflow/orchestration/copilot.py (re-exported from src/workflow/)

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()

# Auto-detect scenario from query
result = copilot.run("Find SQL injection vulnerabilities")

# Force a specific scenario
result = copilot.run(
    "Analyze this module",
    context={"scenario_id": "scenario_2"}
)

# Result structure
{
    'query': 'Find SQL injection vulnerabilities',
    'intent': 'security_audit',
    'scenario_id': 'scenario_2',
    'confidence': 0.92,
    'answer': 'Found 3 potential SQL injection...',
    'evidence': ['Function exec_simple_query at line 142...'],
    'metadata': {...}
}

The copilot builds a LangGraph graph internally via build_multi_scenario_graph(), which wires together intent classification, scenario routing, and scenario-specific handler execution.

Workflow State

MultiScenarioState

All workflows share MultiScenarioState, a TypedDict that flows through the LangGraph nodes.

Location: src/workflow/state.py

from typing import Any, Dict, List, Optional, TypedDict

class MultiScenarioState(TypedDict):
    # Input
    query: str
    context: Optional[Dict[str, Any]]
    language: Optional[str]               # "en" or "ru"

    # Intent Classification
    intent: Optional[str]                 # e.g., "security_audit"
    scenario_id: Optional[str]            # e.g., "scenario_2"
    confidence: Optional[float]           # 0.0–1.0
    classification_method: Optional[str]  # "keyword" or "llm"

    # CPG Data
    cpg_results: Optional[List[Dict]]
    subsystems: Optional[List[str]]
    methods: Optional[List[Dict]]
    call_graph: Optional[Any]

    # Output
    answer: Optional[str]
    evidence: Optional[List[str]]
    metadata: Optional[Dict[str, Any]]
    retrieved_functions: Optional[List[str]]

    # Error Handling
    error: Optional[str]
    retry_count: int

    # Workflow Configuration
    enrichment_config: Optional[Dict[str, Any]]
    vector_store: Optional[Any]

Create initial state with the helper:

from src.workflow.state import create_initial_state

state = create_initial_state(
    query="Find memory leaks",
    language="en",
    context={"subsystem": "executor"}
)

Specialized States

Some scenarios extend the base state with additional fields:

State Class Additional Fields
SecurityWorkflowState vulnerabilities, taint_paths, security_findings, risk_score
PerformanceWorkflowState hotspots, complexity_metrics, bottlenecks
ArchitectureWorkflowState dependency_graph, layer_violations, module_coupling

Workflow Nodes

Intent Classification

The first node classifies the user query into a scenario using bilingual (EN/RU) keyword matching and optional LLM fallback.

Location: src/workflow/scenarios/_intent/

from src.workflow.orchestration import classify_intent_node

# Called internally by the graph
state = classify_intent_node(state)
# Populates: state['intent'], state['scenario_id'], state['confidence']

Scenario Routing

After classification, the router dispatches to the matched scenario workflow.

from src.workflow.orchestration import route_by_intent

# Returns the scenario node name
next_node = route_by_intent(state)
# e.g., "security_workflow", "onboarding_workflow"

Scenario Execution

Each scenario workflow is a LangGraph subgraph that: 1. Queries the CPG database via CPGQueryService 2. Processes results through scenario-specific handlers 3. Formats the answer using localized formatters

Scenario Workflows

Structure

Each scenario follows the pattern src/workflow/scenarios/{name}_handlers/:

src/workflow/scenarios/
├── _base/                    # Base handler class
│   └── handler.py            # BaseHandler
├── _intent/                  # Intent classification
├── security/                 # Security scenario
│   ├── handlers/
│   ├── formatters/
│   └── __init__.py
├── onboarding/               # Onboarding scenario
│   ├── handlers/
│   └── __init__.py
├── architecture_handlers/    # Architecture scenario
│   ├── handlers/
│   ├── formatters/
│   └── __init__.py
├── performance_handlers/     # Performance scenario
├── refactoring_handlers/     # Refactoring scenario
├── code_review_handlers/     # Code review scenario
├── compliance_handlers/      # Compliance scenario
├── documentation_handlers/   # Documentation scenario
├── tech_debt_handlers/       # Tech debt scenario
├── debugging_handlers/       # Debugging scenario
├── concurrency_handlers/     # Concurrency scenario
├── coverage_handlers/        # Test coverage scenario
├── cross_repo_handlers/      # Cross-repo scenario
├── feature_dev_handlers/     # Feature development scenario
├── audit_composite.py        # Audit (runs 9 sub-scenarios)
├── code_optimization.py      # Code optimization
├── file_editing.py           # File editing
├── pattern_search_handlers/  # Structural pattern search scenario
│   ├── handlers/
│   └── __init__.py
├── standards_check.py        # Standards check
└── dependencies_analysis.py  # Dependency analysis

Available Scenarios

ID Name Entry Point Purpose
01 onboarding onboarding_workflow Codebase onboarding and navigation
02 security security_workflow Vulnerability detection
03 performance performance_workflow Performance and complexity
04 architecture architecture_workflow Architectural analysis
05 refactoring refactoring_workflow Refactoring assistance
06 documentation documentation_workflow Documentation generation
07 compliance compliance_workflow Compliance checking
08 code_review code_review_workflow Code review automation
09 tech_debt tech_debt_workflow Tech debt quantification
10 cross_repo cross_repo_workflow Cross-repo impact analysis
11 debugging debugging_workflow Debugging support
12 concurrency concurrency_workflow Concurrency analysis
13 coverage test_coverage_workflow Test coverage analysis
14 feature_dev feature_dev_workflow Feature development
15 security_incident security_incident_workflow Incident response
16 large_scale_refactoring large_scale_refactoring_workflow Enterprise-scale refactoring
17 file_editing file_editing_workflow AST-based file editing
18 code_optimization optimization_workflow Code optimization
19 standards_check standards_check_workflow Standards-guided optimization
20 dependencies dependencies_workflow Dependency analysis
21 pattern_search pattern_search_workflow Structural pattern search with CPG constraints
audit AuditRunner Composite: 12-dimension quality audit

Handler Base Class

All scenario handlers inherit from BaseHandler:

Location: src/workflow/scenarios/_base/handler.py

from src.workflow.scenarios._base.handler import BaseHandler
from src.workflow.scenarios._base.handler import HandlerResult

class MyHandler(BaseHandler):
    async def handle(self) -> HandlerResult:
        # self.cpg   — CPGQueryService instance
        # self.state  — MultiScenarioState dict
        # self.cfg    — Unified config
        # self.query  — Original query string
        # self.language — "en" or "ru"
        results = self.cpg.get_methods_by_subsystem("executor")
        return HandlerResult(
            answer="Found methods...",
            evidence=results,
            metadata={"handler": "my_handler"}
        )

Warning: Do NOT use AnalysisHandler from src/workflow/handlers/analysis.py as a base class — its constructor signature is incompatible with the scenario registry.

Composite Workflows

Three composite orchestrators run sub-scenarios in parallel or sequentially:

Composite Scenario IDs Mode Timeout
code_optimization (S18) 02, 05, 06, 11, 12 Parallel 60s
standards_check (S19) 08, 17, 18 Sequential 45s
audit 02, 03, 05, 06, 07, 08, 11, 12, 16 Parallel 600s

Audit runs 9 sub-scenarios in parallel, covering 12 code quality dimensions (security, complexity, duplication, dependencies, naming, error handling, testing, documentation, performance, portability, style, architecture). Exposed via:

python -m src.cli audit --db PATH [--language ru] [--format json]
python -m src.cli audit --db PATH --autofix  # Audit + autofix suggestions

The --autofix flag generates automated fix suggestions for security vulnerabilities found during the audit, using AutofixEngine on taint paths. Configuration in config.yamlautofix section.

Exec provides non-interactive CI/CD execution with PR security review:

python -m src.cli exec --prompt "Review security" --base-ref origin/main \
    --sarif-file out.sarif --comment-file comment.md --sandbox read-only

The exec pipeline gets changed files, scans changed methods, computes “New vs Fixed” delta via fingerprinting, generates SARIF 2.1.0 output (via SARIFExporter), and PR comment markdown. Configuration in config.yamlreporting.

Conflict resolution uses priority mode with security (1.5x) and compliance (1.3x) boosts. Configuration in config.yamlcomposition.

Error Handling

Retry Logic

Workflows support automatic retry with query refinement:

# Built into the graph — configurable via state['retry_count']
# Default: up to 2 retries with adaptive query refinement

Fallback Strategies

When LLM-based generation fails, workflows fall back to template-based query matching:

# Automatic fallback chain:
# 1. LLM-generated SQL query
# 2. Template-matched query from query examples
# 3. Direct CPG method call

Custom Workflows

Creating a Custom Workflow

from langgraph.graph import StateGraph
from src.workflow.state import MultiScenarioState

def create_custom_workflow():
    workflow = StateGraph(MultiScenarioState)

    workflow.add_node("analyze", my_analyze_node)
    workflow.add_node("process", my_process_node)
    workflow.add_node("interpret", my_interpret_node)

    workflow.add_edge("analyze", "process")
    workflow.add_edge("process", "interpret")

    workflow.set_entry_point("analyze")
    workflow.set_finish_point("interpret")

    return workflow.compile()

result = create_custom_workflow().invoke({"query": "..."})

Conditional Routing

def route_by_intent(state: MultiScenarioState) -> str:
    if state["intent"] == "find_vulnerabilities":
        return "security_node"
    elif state["intent"] == "find_performance":
        return "performance_node"
    else:
        return "general_node"

workflow.add_conditional_edges(
    "analyze",
    route_by_intent,
    {
        "security_node": "security",
        "performance_node": "performance",
        "general_node": "general"
    }
)

Streaming

Progress streaming is supported through the LangGraph streaming interface:

copilot = MultiScenarioCopilot()

# Streaming is handled at the API/TUI layer
# See src/api/routers/ for WebSocket streaming
# See src/tui/ for terminal streaming

Next Steps