Developer or security engineer finding code matching structural patterns with CPG-aware constraints, powered by GoCPG’s AST-based engine.
Table of Contents¶
- Quick Start
- How It Works
- Architecture
- Pattern Types
- YAML Rules
- CLI Usage
- REST API
- MCP Tools
- Data Models
- LLM Rule Generation
- Security Integration
- PatternTaintBridge
- SSR Autofix Bridge
- DB Tables
- Example Questions
- Related
Quick Start¶
/select pattern_search
Or via CLI:
# Ad-hoc pattern search
python -m src.cli patterns search "malloc($x)" --lang c
# Scan with all rules
python -m src.cli patterns scan
# Scan specific rule
python -m src.cli patterns scan --rule unchecked-return
How It Works¶
Architecture¶
Structural Pattern Search is a standalone tool (not a LangGraph scenario) accessible via CLI, REST API, and MCP. The backend is GoCPG’s pattern engine, communicating over gRPC:
CLI / REST API / MCP
|
v
GoCPGClient (.scan(), .search(), .validate_rule())
| gRPC
v
gocpg binary (AST matching + CPG constraints)
|
v
DuckDB (cpg_pattern_results, cpg_pattern_rules)
|
v
PatternQueriesMixin (Python reads persisted results)
|
+---> PatternTaintBridge (enrich with taint paths)
+---> SSRAutofixBridge (generate security fixes)
GoCPGClient.scan() runs YAML rules against the CPG database. GoCPGClient.search() performs ad-hoc AST pattern matching. Results are persisted in cpg_pattern_results and queried by PatternQueriesMixin in src/services/cpg/pattern_queries.py.
Pattern Types¶
Syntactic patterns — tree-sitter CST patterns with metavariables:
| Metavariable | Matches |
|---|---|
$VAR |
Any single expression or identifier |
$$ARGS |
Zero or more arguments |
$_ |
Any node (wildcard) |
# Find malloc calls
python -m src.cli patterns search "malloc($x)" --lang c
# Find if-return without else
python -m src.cli patterns search "if ($cond) { return $val; }" --lang c
CPG-constrained patterns — patterns with data flow, call graph, type, and domain constraints:
id: unchecked-return
pattern: "$ret = $func($$args)"
language: c
constraints:
- type: data_flow
from: "$ret"
not_reaches: "if ($ret"
- type: call_graph
callee: "$func"
returns: "int"
message: "Return value of $func is not checked"
severity: warning
YAML Rules¶
Pre-defined rules in configs/rules/ — 190 rules across 14 languages. Domain-specific rules can be auto-loaded from the active domain plugin with --domain-rules.
CLI Usage¶
6 subcommands under python -m src.cli patterns:
# Scan with all rules
python -m src.cli patterns scan --db data/projects/test.duckdb
# Scan with severity filter and SARIF output
python -m src.cli patterns scan --severity error --format sarif --output results.sarif
# Scan with domain-specific rules auto-loaded
python -m src.cli patterns scan --domain-rules
# Scan with incremental evaluation
python -m src.cli patterns scan --incremental
# Scan a specific rule
python -m src.cli patterns scan --rule unchecked-return
# Ad-hoc pattern search
python -m src.cli patterns search "malloc($SIZE)" --lang c --max-results 50
# Apply fixes (dry run)
python -m src.cli patterns fix --dry-run
# Apply fixes (with approval)
python -m src.cli patterns fix --rule unchecked-return
# List loaded pattern rules
python -m src.cli patterns list
# Show pattern statistics
python -m src.cli patterns stats
# Generate rule from natural language
python -m src.cli patterns generate "find unchecked return values" --lang c --output rule.yaml
Output formats: text (default), json, sarif. The fix subcommand uses ApprovalEngine for interactive approval before applying changes.
REST API¶
6 endpoints mounted at /api/v1/patterns/:
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/patterns/search |
Ad-hoc structural pattern search |
GET |
/api/v1/patterns/findings |
Query persisted pattern findings (filters: rule_id, severity, filename, category) |
GET |
/api/v1/patterns/stats |
Aggregated statistics by severity, category, rule |
GET |
/api/v1/patterns/rules |
List loaded pattern rules from cpg_pattern_rules |
POST |
/api/v1/patterns/generate |
LLM-generate a YAML rule from description |
POST |
/api/v1/patterns/fix |
Apply SSR fixes (dry_run=true by default, approval required) |
Example:
# Search patterns
curl -X POST http://localhost:8000/api/v1/patterns/search \
-H "Content-Type: application/json" \
-d '{"pattern": "malloc($x)", "language": "c", "max_results": 50}'
# Get findings for a rule
curl "http://localhost:8000/api/v1/patterns/findings?rule_id=unchecked-return"
# Get statistics
curl http://localhost:8000/api/v1/patterns/stats
# List rules
curl http://localhost:8000/api/v1/patterns/rules
# Generate rule
curl -X POST http://localhost:8000/api/v1/patterns/generate \
-H "Content-Type: application/json" \
-d '{"description": "find unchecked return values", "language": "c"}'
# Apply fix (dry run)
curl -X POST http://localhost:8000/api/v1/patterns/fix \
-H "Content-Type: application/json" \
-d '{"rule_id": "unchecked-return", "dry_run": true}'
MCP Tools¶
6 tools registered in src/mcp/tools/patterns.py:
| Tool | Parameters | Description |
|---|---|---|
codegraph_pattern_search |
pattern, language, max_results | AST-based structural search |
codegraph_pattern_findings |
rule_id?, severity?, filename?, category?, limit | Query persisted findings |
codegraph_pattern_stats |
(none) | Aggregated statistics |
codegraph_pattern_fix |
rule_id?, dry_run | Apply SSR fixes |
codegraph_pattern_generate |
description, language, with_fix | LLM-generate YAML rule |
codegraph_pattern_test |
rule_yaml, code_snippet, language | Test rule against code snippet |
Data Models¶
Key models from src/services/gocpg/models.py:
| Model | Key Fields |
|---|---|
ScanConfig |
rule_dirs, rule_id, severity_filter, incremental, fix, dry_run, output_format, output_path, domain_rules |
GoCPGScanResult |
findings, rules_evaluated, files_scanned, total_matches, duration_ms, incremental, sarif_path |
GoCPGSearchResult |
matches, pattern, language, files_searched |
GeneratedRule |
yaml_text, rule_id, language, has_fix, validated, validation_errors, generation_attempts |
LLM Rule Generation¶
LLMPatternGenerator in src/analysis/patterns/llm_pattern_generator.py generates YAML rules from natural language descriptions:
- Builds a structured prompt from
config/prompts/patterns/generate_pattern.yaml - Calls the configured LLM provider
- Parses YAML from the response
- Validates via
gocpg validate-rule(up to 3 retry attempts on failure) - Returns
GeneratedRulewith validation status
# CLI
python -m src.cli patterns generate "find SQL queries built with string concatenation" --lang python --output rule.yaml
# MCP
codegraph_pattern_generate(description="find unchecked malloc calls", language="c", with_fix=true)
The generate_rule() method is async. The with_fix parameter controls whether a fix: template is included in the generated rule.
Security Integration¶
PatternTaintBridge¶
PatternTaintBridge in src/analysis/patterns/taint_bridge.py enriches structural pattern findings with taint analysis data from S02 (Security Audit):
- Constructor:
PatternTaintBridge(cpg_service, taint_propagator=None) - Method:
enrich_findings_with_taint(findings)(async) - For findings with
has_cpg=True, queries CPG for taint paths flowing through the matched node - Adds
taint_pathsandtaint_enrichedkeys to each finding dict
This bridges the gap between structural pattern detection and security vulnerability analysis — a pattern finding at a specific code location can be cross-referenced with taint flow data to determine if untrusted data reaches that location.
SSR Autofix Bridge¶
SSRAutofixBridge in src/analysis/autofix/ssr_bridge.py connects pattern engine SSR rules with the AutofixEngine described in Security Audit: Autofix:
- Maps vulnerability types to SSR rule IDs in
configs/rules/autofix/ - Runs
gocpg scan --fix --dry-runper file batch to get AST-aware fix previews - Converts results to
FixSuggestionobjects for the autofix pipeline
Vulnerability type mapping (excerpt):
| Vulnerability | SSR Rules |
|---|---|
| sql_injection | autofix-sprintf-snprintf, autofix-py-format-sql, autofix-go-sprintf-sql, … |
| buffer_overflow | autofix-sprintf-snprintf, autofix-strcpy, autofix-py-ctypes, autofix-go-cgo-strcpy |
| null_dereference | autofix-null-assert |
| command_injection | autofix-py-subprocess, autofix-go-exec |
The flow: Pattern scan detects code issues → SSRAutofixBridge maps to fix rules → AutofixEngine generates patches → DiffValidator verifies. SSR fixes have the highest confidence (0.8-1.0) in the autofix pipeline. See Security Audit: Autofix for the full autofix architecture.
DB Tables¶
GoCPG persists scan results in DuckDB:
cpg_pattern_results— findings: id, rule_id, severity, category, filename, line_number, column_number, code, message, confidence, match_data, cpg_contextcpg_pattern_rules— loaded rules: rule_id, language, severity, category, has_cpg, rule_source
Python reads these via PatternQueriesMixin methods: get_pattern_findings(), get_pattern_rules(), get_pattern_statistics().
Example Questions¶
Find unchecked return values
Find malloc without free
Show functions matching error-handling anti-patterns
Find SQL query construction without parameterization
Find all functions with cyclomatic complexity > 20
Related¶
- Security Audit: Autofix — Autofix pipeline using SSR rules from pattern engine
- Security Audit — Vulnerability detection via taint analysis (PatternTaintBridge links to this)
- Composite Workflows — Orchestration guide