Overview¶
CodeGraph includes an automated STRIDE threat model generator that builds threat models directly from the Code Property Graph (CPG). Instead of manually drawing data flow diagrams and enumerating threats in spreadsheets, CodeGraph extracts DFD components, data flows, and trust boundaries from the parsed codebase, then classifies threats using the STRIDE methodology with 43 CWE-to-STRIDE mappings.
The feature produces structured output in multiple formats (JSON, Markdown, GOST, SARIF 2.1.0, Mermaid DFD) and supports incremental updates so that the threat model stays current as code evolves.
Key capabilities:
- Automatic DFD extraction from CPG entry points, call graph, taint sources/sinks
- Five trust boundary types detected via heuristic pattern matching
- 43 CWEs mapped to 6 STRIDE categories (MITRE Top 25 + OWASP Top 10)
- 18 standard mitigations plus 14 CWE-specific recommendations
- GOST R 56939-2024 process 5.7 compliance (artifacts 5.7.3.1 through 5.7.3.7)
- Incremental delta computation between model versions
- Bilingual export (English and Russian)
Source files: src/security/threat_model/
| File | Purpose |
|---|---|
models.py |
Data models: ThreatModel, STRIDEThreat, DataComponent, DataFlow, TrustBoundary, ThreatModelDelta |
dfd_builder.py |
Extracts DFD from CPG tables (nodes_method, edges_call) |
trust_boundary.py |
Detects 5 trust boundary types from entry point categories |
stride_classifier.py |
Maps CWE findings and CPG patterns to STRIDE categories |
mitigation.py |
Recommends countermeasures per STRIDE category and CWE |
exporter.py |
Exports to JSON, Markdown (EN/RU), GOST, SARIF 2.1.0, Mermaid DFD |
incremental.py |
Computes delta between old and new models |
STRIDE Methodology¶
STRIDE is a threat classification framework developed by Microsoft. Each letter represents a category of threat:
| Category | Enum Value | Description | Security Property Violated |
|---|---|---|---|
| Spoofing | spoofing |
Impersonating a user, system, or component | Authentication |
| Tampering | tampering |
Unauthorized modification of data or code | Integrity |
| Repudiation | repudiation |
Denying an action without proof otherwise | Non-repudiation |
| Information Disclosure | info_disclosure |
Exposing data to unauthorized parties | Confidentiality |
| Denial of Service | dos |
Degrading or denying availability | Availability |
| Elevation of Privilege | elevation |
Gaining unauthorized access or capabilities | Authorization |
CodeGraph maps 43 CWEs to these 6 categories. Some CWEs map to multiple categories
(e.g., CWE-79 maps to both Tampering and Information Disclosure). The full mapping is
defined in CWE_TO_STRIDE inside stride_classifier.py.
Architecture¶
The threat model pipeline consists of six stages executed in sequence:
graph LR
CPG[(CPG DuckDB)] --> DFD[DFD Builder]
Domain[Domain Plugin] --> DFD
DFD --> TB[Trust Boundary Detector]
TB --> SC[STRIDE Classifier]
Hyp[Hypothesis Results] --> SC
SC --> MR[Mitigation Recommender]
MR --> EXP[Exporter]
EXP --> JSON[JSON]
EXP --> MD[Markdown]
EXP --> GOST[GOST Report]
EXP --> SARIF[SARIF 2.1.0]
EXP --> MER[Mermaid DFD]
PrevModel[Previous Model] --> INC[Incremental Updater]
EXP --> INC
INC --> Delta[ThreatModelDelta]
Stage 1: DFD Builder (dfd_builder.py)¶
The DFDBuilder class extracts a Data Flow Diagram from the CPG. It queries three
categories of components:
- Processes: Modules containing entry points (from
nodes_method WHERE is_entry_point = true), grouped by file-level granularity. - Data Stores: Inferred from taint sink categories (database, file, cache, log, network_send) via the domain plugin or built-in fallback.
- External Entities: Inferred from taint source categories (network, user_input, file_system, environment, ipc) with
UNTRUSTEDtrust level.
Data flows are extracted from the call graph (edges_call) between entry-point modules,
plus synthetic flows from external entities to processes with matching categories.
Each component receives a TrustLevel:
- UNTRUSTED (0) – network-facing, socket, user input, protocol handlers
- PARTIALLY_TRUSTED (1) – auth, connection handlers
- TRUSTED (2) – internal modules
Stage 2: Trust Boundary Detector (trust_boundary.py)¶
The TrustBoundaryDetector identifies five types of trust boundaries:
| Boundary Type | Indicators | Entry Categories |
|---|---|---|
network |
recv, accept, listen, http_handler, grpc_handler, serve | network, socket, protocol, http_handler |
auth |
authenticate, authorize, check_permission, verify_token, login | auth, connection |
ffi |
cgo_call, ctypes, cffi, jni, ffi_call | extension |
process |
exec, subprocess, popen, system, spawn | exec |
file_system |
open, read_file, write_file, fopen, readdir | file_access |
Detection is heuristic: it matches component names and entry point categories against indicator lists. Custom indicators can be provided via configuration.
After detection, the detector marks data flows that cross trust boundaries. A flow crosses a boundary when the source and target have different trust levels, or when one is inside and the other outside a boundary.
Stage 3: STRIDE Classifier (stride_classifier.py)¶
The STRIDEClassifier generates threats from four sources:
- Hypothesis findings – CWE-based security findings are mapped to STRIDE categories via the
CWE_TO_STRIDEdictionary. - CPG pattern findings – Rows from
cpg_pattern_findingsare classified by their CWE IDs, or by pattern name heuristics when CWEs are absent. - Boundary crossing inference – Unencrypted flows across trust boundaries produce Information Disclosure threats; user input across boundaries produces Tampering threats; credential flows without auth boundaries produce Spoofing threats.
- Unprotected entry points – Network-facing entry points not inside an auth boundary produce Spoofing threats (CWE-306).
The classifier deduplicates threats by (category, affected_component, cwe_ids) and
ranks them by risk score (severity x likelihood).
Stage 4: Mitigation Recommender (mitigation.py)¶
The MitigationRecommender provides two layers of recommendations:
- Category-based: 18 standard mitigations across 6 STRIDE categories (3 per category), each with an ID (e.g.,
M-S-1,M-T-2), title, and description. - CWE-specific: 14 CWE-specific mitigations for the most common vulnerabilities (CWE-89, CWE-79, CWE-78, CWE-120, CWE-287, CWE-200, CWE-352, CWE-502, CWE-319, CWE-400, CWE-862, CWE-787, CWE-416, CWE-476).
Mitigation output is prioritized by severity: critical/high threats get all mitigations, medium gets 4, low gets 2.
Stage 5: Exporter (exporter.py)¶
The ThreatModelExporter supports five output formats:
| Format | Method | Description |
|---|---|---|
| JSON | to_json() / to_json_string() |
Full model as JSON dict or formatted string |
| Markdown | to_markdown(language="en"\|"ru") |
Report with summary tables, DFD, threat list, mitigations |
| GOST | to_gost(language="ru") |
GOST R 56939-2024 artifacts 5.7.3.1-5.7.3.4 |
| SARIF 2.1.0 | to_sarif() |
OASIS SARIF format for IDE/CI integration |
| Mermaid DFD | to_mermaid_dfd() |
Mermaid diagram with trust boundary subgraphs |
The Mermaid DFD renderer uses distinct shapes for component types:
- Processes: (name) (rounded)
- Data Stores: [(name)] (cylinder)
- External Entities: [/name\] (trapezoid)
- Encrypted flows: solid arrows with “encrypted” label
- Boundary-crossing flows: dashed arrows
Stage 6: Incremental Updater (incremental.py)¶
The IncrementalThreatModelUpdater compares a previous and newly generated model,
producing a ThreatModelDelta with:
added_threats– threats present only in the new modelremoved_threats– threats present only in the old modelmodified_threats– threats with the same ID but changed severity, status, category, CWEs, or affected componentadded_components/removed_components– DFD component changes
The delta is stored in the merged model’s metadata.delta_summary for audit trail purposes.
CLI Usage¶
All threat model commands are under the threat-model subcommand group.
Generate a full threat model¶
python -m src.cli threat-model generate \
--db data/projects/myproject.duckdb \
--format json \
--output threat_model.json
Options:
| Flag | Default | Description |
|---|---|---|
--db |
Active project DB | Path to DuckDB CPG database |
--format |
json |
Output format: json, markdown, gost, sarif, mermaid |
--output |
stdout | Output file path |
--language |
en |
Language for Markdown/GOST: en or ru |
--include-mitigations |
true |
Include mitigation recommendations |
--hypothesis-results |
none | Path to JSON file with hypothesis findings |
Incremental update¶
python -m src.cli threat-model update \
--db data/projects/myproject.duckdb \
--previous threat_model_v1.json \
--output threat_model_v2.json \
--changed-files src/auth/login.c src/net/handler.c
Produces the updated model plus a delta summary showing added, removed, and modified threats.
Export DFD only¶
python -m src.cli threat-model dfd \
--db data/projects/myproject.duckdb \
--format mermaid \
--output dfd.mmd
Generates just the Data Flow Diagram in Mermaid format, without running STRIDE classification.
List threats¶
python -m src.cli threat-model list \
--db data/projects/myproject.duckdb \
--severity high,critical \
--category spoofing,elevation \
--format json
Filters and lists threats from the current model, useful for CI pipelines.
API Endpoints¶
Seven REST endpoints are available under /api/v1/security/threat-model/.
POST /generate¶
Generate a full threat model for the active project.
curl -X POST http://localhost:8000/api/v1/security/threat-model/generate \
-H "Content-Type: application/json" \
-H "X-Project-Id: myproject" \
-d '{
"include_mitigations": true,
"hypothesis_results_path": null
}'
Response: full ThreatModel JSON object.
GET /export¶
Export the current threat model in a specified format.
curl "http://localhost:8000/api/v1/security/threat-model/export?format=markdown&language=en"
Query parameters: format (json|markdown|gost|sarif|mermaid), language (en|ru).
GET /dfd¶
Return the Data Flow Diagram for the active project.
curl "http://localhost:8000/api/v1/security/threat-model/dfd?format=mermaid"
Returns Mermaid source text or JSON component/flow structure depending on format.
GET /threats¶
List threats with optional filtering.
curl "http://localhost:8000/api/v1/security/threat-model/threats?severity=critical,high&category=spoofing"
Query parameters: severity, category, status, cwe_id. Returns a filtered list.
POST /update¶
Incremental update from a previous model.
curl -X POST http://localhost:8000/api/v1/security/threat-model/update \
-H "Content-Type: application/json" \
-d '{
"previous_model": { ... },
"changed_files": ["src/auth/login.c"]
}'
Response includes model (updated) and delta (changes).
GET /mitigations¶
Get mitigation recommendations for all threats or a specific threat ID.
curl "http://localhost:8000/api/v1/security/threat-model/mitigations?threat_id=TM-spoofing-CWE-287-hyp-1"
GET /stride-mapping¶
Return the CWE-to-STRIDE mapping table.
curl "http://localhost:8000/api/v1/security/threat-model/stride-mapping"
Returns { "CWE-89": ["tampering"], "CWE-287": ["spoofing"], ... } with all 43 entries.
MCP Tools¶
Two MCP tools are exposed for IDE and agent integration.
codegraph_threat_model_generate¶
Generates a threat model for the active project.
{
"tool": "codegraph_threat_model_generate",
"arguments": {
"format": "json",
"language": "en",
"include_mitigations": true
}
}
Returns the complete threat model in the requested format.
codegraph_threat_model_dfd¶
Returns the Data Flow Diagram.
{
"tool": "codegraph_threat_model_dfd",
"arguments": {
"format": "mermaid"
}
}
Returns Mermaid DFD source or JSON structure with components, flows, and trust boundaries.
GOST R 56939-2024 Compliance¶
The threat model feature implements process 5.7 (Threat Modeling) from GOST R 56939-2024.
The to_gost() exporter produces four artifacts required by the standard:
| Artifact | GOST Section | Content |
|---|---|---|
| Threat model table | 5.7.3.1 | STRIDE-classified threats with severity, CWE, component, status |
| Mitigation list | 5.7.3.2 | Prioritized countermeasures per threat |
| Attack surface description | 5.7.3.3 | Entry points, trust boundaries, boundary-crossing flows |
| Research targets | 5.7.3.4 | High-risk components ranked by critical/high threat count |
The ThreatModel.compliance_score property calculates the ratio of mitigated threats
to total threats, per GOST 5.7 requirements.
Per GOST R 56939-2024 section 5.7.2.4, the threat model must be updated when the codebase
changes. The incremental updater (incremental.py) fulfills this requirement by computing
deltas between model versions and recording the change context (changed files, previous
version) in the model metadata.
Configuration¶
Threat model settings are in config.yaml under the threat_model: section, backed by
ThreatModelConfig in unified_config.py.
threat_model:
enabled: true
# Minimum severity to include in reports
min_severity: low # low | medium | high | critical
# Include mitigations by default
include_mitigations: true
# Default export format
default_format: json # json | markdown | gost | sarif | mermaid
# Default language for bilingual exports
default_language: en # en | ru
# Trust boundary detection customization
trust_boundary_detection:
network_indicators: null # Override default network indicators
auth_indicators: null # Override default auth indicators
ffi_indicators: null # Override default FFI indicators
# Incremental update settings
incremental:
auto_update: false # Auto-update on CPG re-parse
store_history: true # Store previous versions for delta
max_history_versions: 10 # Max stored versions
# GOST compliance
gost:
enabled: true # Enable GOST artifact generation
include_research_targets: true
Access configuration programmatically:
from src.config import get_unified_config
cfg = get_unified_config()
cfg.threat_model.enabled
cfg.threat_model.min_severity
cfg.threat_model.default_format
cfg.threat_model.trust_boundary_detection
CWE-to-STRIDE Mapping Reference¶
The 43 CWEs are distributed across STRIDE categories as follows. Some CWEs appear in multiple categories.
Spoofing (7 CWEs): CWE-287, CWE-290, CWE-294, CWE-295, CWE-306, CWE-384, CWE-613
Tampering (9 CWEs): CWE-20, CWE-79, CWE-89, CWE-94, CWE-78, CWE-352, CWE-434, CWE-502, CWE-611, CWE-917
Repudiation (3 CWEs): CWE-778, CWE-223, CWE-532
Information Disclosure (9 CWEs): CWE-200, CWE-209, CWE-312, CWE-319, CWE-522, CWE-538, CWE-601, CWE-732
Denial of Service (5 CWEs): CWE-400, CWE-770, CWE-776, CWE-835, CWE-674
Elevation of Privilege (9 CWEs): CWE-250, CWE-269, CWE-276, CWE-863, CWE-862, CWE-120, CWE-416, CWE-476, CWE-787, CWE-190
Cross-category mappings: CWE-79 (Tampering + Info Disclosure), CWE-94 (Tampering + Elevation), CWE-78 (Tampering + Elevation), CWE-502 (Tampering + Elevation), CWE-611 (Tampering + Info Disclosure), CWE-532 (Repudiation + Info Disclosure), CWE-601 (Info Disclosure + Spoofing), CWE-476 (DoS + Elevation), CWE-190 (Elevation + DoS).
Risk Scoring¶
Each STRIDEThreat has a computed risk_score property:
risk_score = severity_value * likelihood_value
Severity values: critical=10, high=7, medium=4, low=1. Likelihood values: high=3, medium=2, low=1.
Maximum risk score: 30 (critical severity, high likelihood). Threats are sorted by risk score descending in classifier output.
Integration with Hypothesis System¶
The STRIDE classifier consumes findings from the Security Hypothesis System (V2).
When hypothesis results are provided (via CLI --hypothesis-results flag or API request),
the classifier maps each finding’s CWE ID to STRIDE categories and creates threats with
full provenance (hypothesis finding ID in the evidence field).
This integration means the threat model benefits from the hypothesis system’s 58 CWE patterns and 27 CAPEC attack patterns, significantly increasing coverage beyond what static CPG pattern matching alone provides.
Examples¶
Generating a GOST-compliant report¶
python -m src.cli threat-model generate \
--db data/projects/postgres.duckdb \
--format gost \
--language ru \
--output threat_model_gost.md
Mermaid DFD in a CI pipeline¶
python -m src.cli threat-model dfd \
--db data/projects/myapp.duckdb \
--format mermaid \
--output docs/dfd.mmd
# Convert to image (requires mmdc / mermaid-cli)
mmdc -i docs/dfd.mmd -o docs/dfd.svg
Incremental update after a code change¶
# 1. Re-parse changed files
./gocpg parse --input=src/ --output=myapp.duckdb --lang=c --incremental
# 2. Update threat model
python -m src.cli threat-model update \
--db myapp.duckdb \
--previous threat_model_v1.json \
--output threat_model_v2.json \
--changed-files src/auth/handler.c src/net/server.c
# 3. Check delta
python -c "
import json
with open('threat_model_v2.json') as f:
model = json.load(f)
delta = model['metadata']['delta_summary']
print(f'Added: {delta[\"added_threats\"]}, Removed: {delta[\"removed_threats\"]}, Modified: {delta[\"modified_threats\"]}')
"