Composite Workflows

Architecture and usage guide for scenario composition and orchestration.

Table of Contents

Overview

Composite workflows allow scenarios to act as orchestrators, invoking multiple sub-scenarios for comprehensive analysis. This enables complex, multi-faceted code analysis without requiring users to run scenarios individually.

Key Benefits

  • Comprehensive Analysis: Combine security, performance, and quality checks
  • Deduplication: Automatically merge and deduplicate findings
  • Conflict Resolution: Resolve conflicting recommendations
  • Priority Scoring: Rank findings by combined priority metrics
  • Single Entry Point: One command for complete analysis

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                        COMPOSITE WORKFLOW ARCHITECTURE                       │
│                                                                              │
│  User Query: "Optimize src/"                                                 │
│        │                                                                     │
│        ▼                                                                     │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │                    ORCHESTRATOR (S18 or S19)                            │ │
│  │                                                                          │ │
│  │  ┌─────────────────────────────────────────────────────────────────┐   │ │
│  │  │                   SCENARIO INVOKER                               │   │ │
│  │  │                                                                   │   │ │
│  │  │  Parallel Execution:                                              │   │ │
│  │  │  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐                    │   │ │
│  │  │  │ S02 │  │ S05 │  │ S06 │  │ S11 │  │ S12 │                    │   │ │
│  │  │  │Sec. │  │Refac│  │Perf │  │Arch │  │Debt │                    │   │ │
│  │  │  └──┬──┘  └──┬──┘  └──┬──┘  └──┬──┘  └──┬──┘                    │   │ │
│  │  │     │        │       │        │        │                         │   │ │
│  │  │     └────────┴───────┴────────┴────────┘                         │   │ │
│  │  │                      │                                            │   │ │
│  │  │              Sub-Scenario Results                                 │   │ │
│  │  └──────────────────────┼───────────────────────────────────────────┘   │ │
│  │                         │                                                │ │
│  │  ┌──────────────────────▼───────────────────────────────────────────┐   │ │
│  │  │                    RESULT MERGER                                  │   │ │
│  │  │                                                                   │   │ │
│  │  │  Strategy: union | intersection | weighted | consensus            │   │ │
│  │  │  - Deduplication (similarity threshold: 0.8)                      │   │ │
│  │  │  - Source tracking                                                │   │ │
│  │  │  - Metadata preservation                                          │   │ │
│  │  └──────────────────────┼───────────────────────────────────────────┘   │ │
│  │                         │                                                │ │
│  │  ┌──────────────────────▼───────────────────────────────────────────┐   │ │
│  │  │                  CONFLICT RESOLVER                                │   │ │
│  │  │                                                                   │   │ │
│  │  │  Mode: priority_based | security_first | interactive              │   │ │
│  │  │  - Detect conflicting recommendations                             │   │ │
│  │  │  - Apply resolution rules                                         │   │ │
│  │  │  - Log resolution decisions                                       │   │ │
│  │  └──────────────────────┼───────────────────────────────────────────┘   │ │
│  │                         │                                                │ │
│  │  ┌──────────────────────▼───────────────────────────────────────────┐   │ │
│  │  │                 PRIORITY CALCULATOR                               │   │ │
│  │  │                                                                   │   │ │
│  │  │  Algorithm: weighted_sum | risk_based | custom                    │   │ │
│  │  │  Weights: severity(0.3) × impact(0.25) × roi(0.2)                 │   │ │
│  │  │           × confidence(0.15) × consensus(0.1)                     │   │ │
│  │  └──────────────────────┼───────────────────────────────────────────┘   │ │
│  │                         │                                                │ │
│  └─────────────────────────┼────────────────────────────────────────────────┘ │
│                            │                                                  │
│                            ▼                                                  │
│                   UNIFIED FINDINGS                                            │
│                   (sorted by priority)                                        │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

S18 as Orchestrator

Scenario 18 (Code Optimization) orchestrates multiple analysis scenarios for comprehensive code improvement:

Sub-Scenarios

Scenario Name Weight Contribution
S02 Security Audit 1.5 Security vulnerabilities
S05 Refactoring 1.0 Code smells, dead code
S06 Performance 1.2 Performance bottlenecks
S11 Architecture 1.1 Architecture violations
S12 Tech Debt 0.9 Technical debt items

Execution Flow

S18 Optimization Query
        │
        ├─→ S02: Security Analysis
        ├─→ S05: Refactoring Analysis
        ├─→ S06: Performance Analysis
        ├─→ S11: Architecture Analysis
        └─→ S12: Tech Debt Analysis
                │
                ▼
        Merge Findings (weighted)
                │
                ▼
        Resolve Conflicts
                │
                ▼
        Calculate Priorities
                │
                ▼
        Unified Optimization Report

Example Usage

# CLI
python -m src.cli composition run "Optimize src/core/" -o s18

# MCP
/select 18
> Analyze src/core/ for comprehensive optimization

Output

╭─────────────── Composite Optimization Results ────────────────╮
│                                                                │
│  Sub-scenarios invoked: S02, S05, S06, S11, S12                │
│  Findings before merge: 87                                     │
│  Findings after merge: 52                                      │
│  Duplicates removed: 35                                        │
│  Conflicts resolved: 3                                         │
│  Execution time: 4.2s                                          │
│                                                                │
│  Top Findings:                                                 │
│                                                                │
│  1. [Critical] SQL injection vulnerability                     │
│     Source: S02 (Security)                                     │
│     Priority Score: 0.95                                       │
│                                                                │
│  2. [High] Performance bottleneck in loop                      │
│     Source: S06 (Performance)                                  │
│     Priority Score: 0.87                                       │
│                                                                │
│  3. [High] Architecture layer violation                        │
│     Source: S11 (Architecture)                                 │
│     Priority Score: 0.82                                       │
│                                                                │
╰────────────────────────────────────────────────────────────────╯

S19 as Orchestrator

Scenario 19 (Standards Check) orchestrates compliance checking with document-driven rules:

Sub-Scenarios

Scenario Name Role Contribution
S08 Compliance Required Standard compliance rules, pattern matching
S17 File Editing Optional AST-based fix application
S18 Code Optimization Optional Optimized fix recommendations

S08 is always invoked as the core compliance engine. S17 and S18 are optionally integrated — S17 for applying fixes, S18 for optimization passes on the proposed changes.

Document Enrichment

Standards Document
        │
        ▼
    Rule Extraction
        │
        ├─→ S08: Compliance Analysis (required)
        │         │
        │         ▼
        │   Violations Found
        │         │
        │         ▼
        ├─→ S17: Fix Application (optional)
        │         │
        │         ▼
        └─→ S18: Optimize Fixes (optional)
                  │
                  ▼
        Violations with References

Example Usage

# CLI
python -m src.cli composition run "Check against OWASP standards" -o s19

# With custom standards document
python -m src.cli composition run "Check against company_standards.yaml" -o s19

Code Quality Audit

The audit is the most comprehensive composite workflow. It runs 9 sub-scenarios across 12 code quality dimensions with multi-layered false positive filtering.

Sub-Scenarios

Scenario Name Contribution
S02 Security Audit Vulnerabilities, taint analysis
S03 Documentation Comment and documentation quality
S05 Refactoring Code smells, dead code
S06 Performance Performance bottlenecks
S07 Test Coverage Testability and coverage gaps
S08 Compliance Coding standards, regulatory compliance
S11 Architecture Modularity, circular dependencies
S12 Tech Debt Accumulated debt, refactoring ROI
S16 Entry Points Attack surface, external inputs

12 Quality Dimensions

# Dimension Sources
1 Readability & coding standards S05
2 Module structure & component reuse S11
3 Redundant & hard-to-maintain code S05, S12
4 Scalability S06, S11
5 Performance with large data volumes S06
6 Architecture for scalability S11
7 Security vulnerabilities S02, S08
8 SQL injection, XSS & data leaks S02, S16
9 Maintainability S12, S05
10 Test coverage & testability S07
11 Complexity & dependencies S11, S12
12 Documentation & comments S03

Execution Flow

Audit Query
        │
        ▼
  AuditRunner.run()
        │
        ▼
  AuditRunner._collect_metrics()
        │
        ├─→ S02: Security
        ├─→ S03: Documentation
        ├─→ S05: Refactoring
        ├─→ S06: Performance
        ├─→ S07: Tests
        ├─→ S08: Compliance
        ├─→ S11: Architecture
        ├─→ S12: Tech Debt
        └─→ S16: Entry Points
                │
                ▼
        False Positive Filtering (V25-V30)
                │
                ▼
        Dead Code Counting
                │
                ▼
        Merge + Deduplication
                │
                ▼
        Report across 12 Dimensions

Dead Code False Positive Filtering

Multi-layered filtering system (V25–V30) progressively reduces FP rate:

Version Filter Description
V25 is_test Exclude test methods
V26 Class-aware reachability Methods called via class instance
V26b Inheritance-aware Base classes with alive subclass methods
V27 is_nested Exclude nested functions
V28 All-dead module exclusion Modules where all methods are dead
V29 Low-vitality exclusion Modules with ≥5 methods and <40% alive
V30 Alive-file companion FILE-level functions in files with alive code

Example Usage

# CLI
python -m src.cli audit --db data/projects/codegraph.duckdb --language en

# With autofix suggestions
python -m src.cli audit --db data/projects/codegraph.duckdb --autofix

# JSON format for CI/CD
python -m src.cli audit --db data/projects/codegraph.duckdb --format json --output report.json

Configuration

composition:
  orchestrators:
    audit:
      sub_scenarios:
        - scenario_02
        - scenario_03
        - scenario_05
        - scenario_06
        - scenario_07
        - scenario_08
        - scenario_11
        - scenario_12
        - scenario_16
      parallel_execution: true
      timeout_seconds: 600
      max_findings_per_scenario: 50
      deduplicate_paths: true

User Story Validation

Story validation checks that each completed user story from USER_STORIES.md is accessible through at least one interface.

Checked Interfaces

Interface CPG Path Description
CLI src/cli/ CLI commands
REST API src/api/routers/ REST endpoints
MCP src/mcp/ MCP server tools
ACP src/acp/ Agent communication protocol
gRPC src/services/gocpg/grpc_transport.py Go CPG gRPC transport

Evidence Types

Symbol Type Confidence Description
+ dedicated ≥0.8 Dedicated endpoint
~ passthrough ≤0.5 Generic gateway (e.g. /chat)
* scenario_map 0.7 From scenario-to-interface mapping
- not found Not found

Example Usage

# Basic run
python -m src.cli dogfood validate-stories --db data/projects/codegraph.duckdb

# With Go CPG
python -m src.cli dogfood validate-stories \
    --db data/projects/codegraph.duckdb \
    --go-db data/projects/gocpg.duckdb \
    --output report.md

Interface Documentation Sync

Interface documentation sync is a composite workflow that scans 6 interfaces for documentation drift. It runs a 5-phase pipeline to discover code entities, parse existing docs, detect mismatches, and generate a coverage report.

Checked Interfaces

Interface Code Path Doc Path
REST API src/api/routers/ docs/api/{lang}/REST_API.md
CLI src/cli/, src/api/cli.py docs/guides/{lang}/CLI_GUIDE.md
MCP src/mcp/ docs/api/{lang}/MCP_TOOLS.md
ACP src/acp/ docs/api/{lang}/ACP_INTEGRATION.md
gRPC src/services/gocpg/grpc_transport.py docs/api/{lang}/GRPC_API.md
WebSocket src/api/websocket/routes.py, src/api/routers/dashboard_ws.py docs/api/{lang}/WEBSOCKET_API.md

5-Phase Pipeline

Discovery            Doc Parsing          Generation           Drift Detection      Report
     │                    │                    │                    │                  │
     ▼                    ▼                    ▼                    ▼                  ▼
  Scan code          Parse markdown       Generate missing     Match code↔docs    Markdown/JSON
  entities           for documented       doc stubs            Multi-strategy      Coverage %
  per interface      entities             (optional)           matching            per interface

Drift Categories

Category Description
UNDOCUMENTED Code entity exists but has no documentation
STALE Documented entity no longer exists in code
OUTDATED Both exist but parameters/signatures differ
COVERED Properly documented

Matching Strategies

The drift detector uses a multi-strategy matching pipeline (Phase 4):

  1. Exact match — direct name comparison (confidence: 1.0)
  2. Route-aware match — strip route prefixes like /api/v1 (confidence: 0.95)
  3. Case-normalized match — snake_case ↔ kebab-case equivalence (confidence: 0.95)
  4. Fuzzy match — Jaccard similarity on name tokens (confidence: similarity score)

Exclude patterns (regex) can filter entities from drift detection. Configuration:

composition:
  orchestrators:
    interface_docs_sync:
      drift_detection:
        route_prefix_strip: ["/api/v1", "/api/v2", "/api"]
        exclude_patterns: ["^internal_", "^_"]
        case_normalize: true
        fuzzy_threshold: 0.6
        min_coverage_warning: 0.8

Example Usage

# Full report
python -m src.cli docs-sync --db data/projects/codegraph.duckdb

# CI mode (exit 1 if coverage below threshold)
python -m src.cli docs-sync --check --format json

# Filter interfaces
python -m src.cli docs-sync --interfaces rest_api,cli --language en

# REST API
curl -X POST /api/v1/documentation/sync -d '{"interfaces": ["rest_api", "cli"]}'

# MCP tool
codegraph_docs_sync(interfaces="rest_api,cli", output_format="json")

CI Integration

GitHub Actions workflow .github/workflows/docs-sync.yml runs on PRs that modify interface code or docs. Posts a sticky PR comment with coverage metrics and uploads the report as an artifact.

Composition Components

ScenarioInvoker

Invokes sub-scenarios with parallel or sequential execution:

from src.workflow.composition import ScenarioInvoker

invoker = ScenarioInvoker()

# Parallel execution (default)
results = invoker.invoke_parallel(
    scenarios=["scenario_02", "scenario_06", "scenario_11"],
    state=state,
    timeout=30.0,
)

# Sequential execution
results = invoker.invoke_sequential(
    scenarios=["scenario_02", "scenario_06"],
    state=state,
)

ResultMerger

Merges findings with configurable strategies:

from src.workflow.composition import ResultMerger, MergeStrategy

merger = ResultMerger()

# Merge with union strategy (include all)
result = merger.merge(
    scenario_results={"s02": findings_s02, "s06": findings_s06},
    strategy=MergeStrategy.UNION,
)

# Merge with weighted strategy
result = merger.merge(
    scenario_results=all_findings,
    strategy=MergeStrategy.WEIGHTED,
)

Merge Strategies

Strategy Description Use Case
union Include all findings Comprehensive analysis
intersection Only findings from 2+ scenarios High-confidence only
weighted Weight by scenario priority Prioritized analysis
consensus Findings with agreement Conservative approach

ConflictResolver

Resolves conflicting recommendations:

from src.workflow.composition import ConflictResolver

resolver = ConflictResolver()

# Resolve conflicts
resolved_findings, resolution_log = resolver.resolve_conflicts(
    findings=unified_findings,
    mode="priority_based",
)

Conflict Types

Type Description Resolution
Same Location Multiple findings for same code Keep highest priority
Contradictory Conflicting recommendations Apply resolution rules
Overlapping Partially overlapping findings Merge or choose

PriorityCalculator

Calculates combined priority scores:

from src.workflow.composition import PriorityCalculator

calculator = PriorityCalculator()

# Calculate and sort by priority
prioritized = calculator.sort_by_priority(findings)

# Get priority breakdown
breakdown = calculator.get_priority_breakdown(finding)

Priority Weights

Factor Weight Description
severity 0.30 Finding severity level
impact 0.25 Potential impact score
roi 0.20 Return on investment
confidence 0.15 Detection confidence
consensus 0.10 Cross-scenario agreement

Configuration

Configure composition in config.yaml:

composition:
  enabled: true

  orchestrators:
    scenario_18:
      sub_scenarios:
        - scenario_02  # Security
        - scenario_05  # Refactoring
        - scenario_06  # Performance
        - scenario_11  # Architecture
        - scenario_12  # Tech Debt
      parallel_execution: true
      timeout_seconds: 60
      enabled: true

    scenario_19:
      sub_scenarios:
        - scenario_08  # Compliance (required)
      optional_sub_scenarios:
        - scenario_17  # File Editing (optional)
        - scenario_18  # Code Optimization (optional)
      parallel_execution: false
      timeout_seconds: 30
      enabled: true

  merging:
    strategy: weighted  # union, intersection, weighted, consensus
    deduplication_threshold: 0.8
    max_findings: 100
    preserve_sources: true

  priority:
    algorithm: weighted_sum  # weighted_sum, risk_based, custom
    weights:
      severity: 0.30
      impact: 0.25
      roi: 0.20
      confidence: 0.15
      consensus: 0.10

  conflicts:
    resolution_mode: priority_based  # priority_based, security_first, interactive
    security_priority_boost: 1.5
    compliance_priority_boost: 1.3
    log_resolutions: true

API Endpoints

POST /api/v1/composition/query

Execute a composite workflow:

POST /api/v1/composition/query
Content-Type: application/json
Authorization: Bearer <token>

{
  "query": "Optimize src/core/",
  "orchestrator": "scenario_18",
  "context": {
    "language": "en",
    "file_paths": ["src/core/"]
  },
  "parallel": true,
  "merge_strategy": "weighted"
}

Response:

{
  "session_id": "sess_abc123",
  "answer": "Found 52 optimization opportunities...",
  "unified_findings": [],
  "priority_scores": {},
  "sub_scenario_results": {},
  "conflicts_resolved": 3,
  "execution_time_ms": 4234.5,
  "metadata": {}
}

POST /api/v1/composition/apply

Apply a pending edit from a session:

POST /api/v1/composition/apply
Content-Type: application/json
Authorization: Bearer <token>

{
  "session_id": "sess_abc123",
  "finding_id": "find_xyz789",
  "preview": true
}

GET /api/v1/composition/conflicts/{session_id}

Get conflict information for a session:

GET /api/v1/composition/conflicts/sess_abc123
Authorization: Bearer <token>

GET /api/v1/composition/session/{session_id}

Get full session state:

GET /api/v1/composition/session/sess_abc123
Authorization: Bearer <token>

DELETE /api/v1/composition/session/{session_id}

Delete a session and its state:

DELETE /api/v1/composition/session/sess_abc123
Authorization: Bearer <token>

GET /api/v1/composition/config

Get composition configuration:

GET /api/v1/composition/config
Authorization: Bearer <token>

GET /api/v1/composition/scenarios

List available scenarios for composition:

GET /api/v1/composition/scenarios
Authorization: Bearer <token>

CLI Commands

# Run composite workflow
python -m src.cli composition run "<query>" -o s18|s19

# Run with specific sub-scenarios
python -m src.cli composition run "Analyze src/" -o s18 -s scenario_02 -s scenario_06

# Run with merge strategy
python -m src.cli composition run "Optimize code" -o s18 --merge-strategy weighted

# Apply pending edit
python -m src.cli composition apply <finding_id> -s <session_id>

# Preview edit before applying
python -m src.cli composition apply <finding_id> -s <session_id> --preview

# View conflicts
python -m src.cli composition conflicts -s <session_id>

# View configuration
python -m src.cli composition config

# List available scenarios
python -m src.cli composition scenarios

Best Practices

1. Choose the Right Orchestrator

  • Audit for comprehensive analysis: Full check across 12 quality dimensions (9 sub-scenarios, 600s timeout)
  • S18 for optimization: When you need performance, security, and quality improvements (5 sub-scenarios)
  • S19 for compliance: When you need standards verification with references (S08 required + optional S17/S18)
  • Interface Docs Sync for documentation coverage: When you need to detect undocumented endpoints and stale docs (6 interfaces, 120s timeout)

2. Configure Merge Strategy

# For comprehensive analysis (include everything)
merging:
  strategy: union

# For high-confidence findings only
merging:
  strategy: intersection

# For prioritized analysis
merging:
  strategy: weighted

3. Handle Conflicts

Enable conflict logging to understand resolution decisions:

conflicts:
  log_resolutions: true

4. Optimize Performance

For large codebases, use parallel execution with timeout:

orchestrators:
  scenario_18:
    parallel_execution: true
    timeout_seconds: 120

5. Customize Priority Weights

Adjust weights based on your team’s priorities:

priority:
  weights:
    severity: 0.40  # Prioritize severity
    impact: 0.20
    roi: 0.20
    confidence: 0.10
    consensus: 0.10

Project Lifecycle Operations

In addition to composite analysis workflows, CodeGraph provides project lifecycle management across all interfaces:

Available Operations

Operation CLI MCP REST API
List projects projects list codegraph_project_list GET /api/v1/projects
Switch project projects activate <name> codegraph_project_switch POST /api/v1/projects/{id}/activate
Rename project projects rename <old> <new> codegraph_project_rename PUT /api/v1/projects/{id}
Delete project projects delete <name> codegraph_project_delete DELETE /api/v1/projects/{id}

Data Cleanup on Delete

Deletion can optionally remove associated data:

  • DuckDB files — CLI --delete-files, API implicitly via import registry
  • ChromaDB collections — CLI --delete-collections, API ?delete_collections=true, MCP delete_data=true
  • Confirmation prompt — CLI requires --yes / -y to skip interactive confirmation when deleting data

RBAC Permissions

Operation Required Role
List / Switch GroupRole.VIEWER
Rename / Update GroupRole.EDITOR
Delete GroupRole.ADMIN
Delete with data GroupRole.ADMIN