Structural Pattern Search

Developer or security engineer finding code matching structural patterns with CPG-aware constraints, powered by GoCPG’s AST-based engine.

Table of Contents¶

Quick Start
How It Works
Architecture
Pattern Types
YAML Rules
CLI Usage
REST API
MCP Tools
Data Models
LLM Rule Generation
Security Integration
PatternTaintBridge
SSR Autofix Bridge
DB Tables
Example Questions
Related

Quick Start¶

/select pattern_search

Or via CLI:

# Ad-hoc pattern search
python -m src.cli patterns search "malloc($x)" --lang c

# Scan with all rules
python -m src.cli patterns scan

# Scan specific rule
python -m src.cli patterns scan --rule unchecked-return

How It Works¶

Architecture¶

Structural Pattern Search is a standalone tool (not a LangGraph scenario) accessible via CLI, REST API, and MCP. The backend is GoCPG’s pattern engine, communicating over gRPC:

CLI / REST API / MCP
        |
        v
  GoCPGClient (.scan(), .search(), .validate_rule())
        |  gRPC
        v
  gocpg binary (AST matching + CPG constraints)
        |
        v
  DuckDB (cpg_pattern_results, cpg_pattern_rules)
        |
        v
  PatternQueriesMixin (Python reads persisted results)
        |
        +---> PatternTaintBridge (enrich with taint paths)
        +---> SSRAutofixBridge (generate security fixes)

GoCPGClient.scan() runs YAML rules against the CPG database. GoCPGClient.search() performs ad-hoc AST pattern matching. Results are persisted in cpg_pattern_results and queried by PatternQueriesMixin in src/services/cpg/pattern_queries.py.

Pattern Types¶

Syntactic patterns — tree-sitter CST patterns with metavariables:

Metavariable	Matches
`$VAR`	Any single expression or identifier
`$$ARGS`	Zero or more arguments
`$_`	Any node (wildcard)

# Find malloc calls
python -m src.cli patterns search "malloc($x)" --lang c

# Find if-return without else
python -m src.cli patterns search "if ($cond) { return $val; }" --lang c

CPG-constrained patterns — patterns with data flow, call graph, type, and domain constraints:

id: unchecked-return
pattern: "$ret = $func($$args)"
language: c
constraints:
  - type: data_flow
    from: "$ret"
    not_reaches: "if ($ret"
  - type: call_graph
    callee: "$func"
    returns: "int"
message: "Return value of $func is not checked"
severity: warning

YAML Rules¶

Pre-defined rules in configs/rules/ — 190 rules across 14 languages. Domain-specific rules can be auto-loaded from the active domain plugin with --domain-rules.

CLI Usage¶

6 subcommands under python -m src.cli patterns:

# Scan with all rules
python -m src.cli patterns scan --db data/projects/test.duckdb

# Scan with severity filter and SARIF output
python -m src.cli patterns scan --severity error --format sarif --output results.sarif

# Scan with domain-specific rules auto-loaded
python -m src.cli patterns scan --domain-rules

# Scan with incremental evaluation
python -m src.cli patterns scan --incremental

# Scan a specific rule
python -m src.cli patterns scan --rule unchecked-return

# Ad-hoc pattern search
python -m src.cli patterns search "malloc($SIZE)" --lang c --max-results 50

# Apply fixes (dry run)
python -m src.cli patterns fix --dry-run

# Apply fixes (with approval)
python -m src.cli patterns fix --rule unchecked-return

# List loaded pattern rules
python -m src.cli patterns list

# Show pattern statistics
python -m src.cli patterns stats

# Generate rule from natural language
python -m src.cli patterns generate "find unchecked return values" --lang c --output rule.yaml

Output formats: text (default), json, sarif. The fix subcommand uses ApprovalEngine for interactive approval before applying changes.

REST API¶

6 endpoints mounted at /api/v1/patterns/:

Method	Endpoint	Description
`POST`	`/api/v1/patterns/search`	Ad-hoc structural pattern search
`GET`	`/api/v1/patterns/findings`	Query persisted pattern findings (filters: rule_id, severity, filename, category)
`GET`	`/api/v1/patterns/stats`	Aggregated statistics by severity, category, rule
`GET`	`/api/v1/patterns/rules`	List loaded pattern rules from `cpg_pattern_rules`
`POST`	`/api/v1/patterns/generate`	LLM-generate a YAML rule from description
`POST`	`/api/v1/patterns/fix`	Apply SSR fixes (dry_run=true by default, approval required)

Example:

# Search patterns
curl -X POST http://localhost:8000/api/v1/patterns/search \
  -H "Content-Type: application/json" \
  -d '{"pattern": "malloc($x)", "language": "c", "max_results": 50}'

# Get findings for a rule
curl "http://localhost:8000/api/v1/patterns/findings?rule_id=unchecked-return"

# Get statistics
curl http://localhost:8000/api/v1/patterns/stats

# List rules
curl http://localhost:8000/api/v1/patterns/rules

# Generate rule
curl -X POST http://localhost:8000/api/v1/patterns/generate \
  -H "Content-Type: application/json" \
  -d '{"description": "find unchecked return values", "language": "c"}'

# Apply fix (dry run)
curl -X POST http://localhost:8000/api/v1/patterns/fix \
  -H "Content-Type: application/json" \
  -d '{"rule_id": "unchecked-return", "dry_run": true}'

MCP Tools¶

6 tools registered in src/mcp/tools/patterns.py:

Tool	Parameters	Description
`codegraph_pattern_search`	pattern, language, max_results	AST-based structural search
`codegraph_pattern_findings`	rule_id?, severity?, filename?, category?, limit	Query persisted findings
`codegraph_pattern_stats`	(none)	Aggregated statistics
`codegraph_pattern_fix`	rule_id?, dry_run	Apply SSR fixes
`codegraph_pattern_generate`	description, language, with_fix	LLM-generate YAML rule
`codegraph_pattern_test`	rule_yaml, code_snippet, language	Test rule against code snippet

Data Models¶

Key models from src/services/gocpg/models.py:

Model	Key Fields
`ScanConfig`	rule_dirs, rule_id, severity_filter, incremental, fix, dry_run, output_format, output_path, domain_rules
`GoCPGScanResult`	findings, rules_evaluated, files_scanned, total_matches, duration_ms, incremental, sarif_path
`GoCPGSearchResult`	matches, pattern, language, files_searched
`GeneratedRule`	yaml_text, rule_id, language, has_fix, validated, validation_errors, generation_attempts

LLM Rule Generation¶

LLMPatternGenerator in src/analysis/patterns/llm_pattern_generator.py generates YAML rules from natural language descriptions:

Builds a structured prompt from config/prompts/patterns/generate_pattern.yaml
Calls the configured LLM provider
Parses YAML from the response
Validates via gocpg validate-rule (up to 3 retry attempts on failure)
Returns GeneratedRule with validation status

# CLI
python -m src.cli patterns generate "find SQL queries built with string concatenation" --lang python --output rule.yaml

# MCP
codegraph_pattern_generate(description="find unchecked malloc calls", language="c", with_fix=true)

The generate_rule() method is async. The with_fix parameter controls whether a fix: template is included in the generated rule.

Security Integration¶

PatternTaintBridge¶

PatternTaintBridge in src/analysis/patterns/taint_bridge.py enriches structural pattern findings with taint analysis data from S02 (Security Audit):

Constructor: PatternTaintBridge(cpg_service, taint_propagator=None)
Method: enrich_findings_with_taint(findings) (async)
For findings with has_cpg=True, queries CPG for taint paths flowing through the matched node
Adds taint_paths and taint_enriched keys to each finding dict

This bridges the gap between structural pattern detection and security vulnerability analysis — a pattern finding at a specific code location can be cross-referenced with taint flow data to determine if untrusted data reaches that location.

SSR Autofix Bridge¶

SSRAutofixBridge in src/analysis/autofix/ssr_bridge.py connects pattern engine SSR rules with the AutofixEngine described in Security Audit: Autofix:

Maps vulnerability types to SSR rule IDs in configs/rules/autofix/
Runs gocpg scan --fix --dry-run per file batch to get AST-aware fix previews
Converts results to FixSuggestion objects for the autofix pipeline

Vulnerability type mapping (excerpt):

Vulnerability	SSR Rules
sql_injection	autofix-sprintf-snprintf, autofix-py-format-sql, autofix-go-sprintf-sql, …
buffer_overflow	autofix-sprintf-snprintf, autofix-strcpy, autofix-py-ctypes, autofix-go-cgo-strcpy
null_dereference	autofix-null-assert
command_injection	autofix-py-subprocess, autofix-go-exec

The flow: Pattern scan detects code issues → SSRAutofixBridge maps to fix rules → AutofixEngine generates patches → DiffValidator verifies. SSR fixes have the highest confidence (0.8-1.0) in the autofix pipeline. See Security Audit: Autofix for the full autofix architecture.

DB Tables¶

GoCPG persists scan results in DuckDB:

cpg_pattern_results — findings: id, rule_id, severity, category, filename, line_number, column_number, code, message, confidence, match_data, cpg_context
cpg_pattern_rules — loaded rules: rule_id, language, severity, category, has_cpg, rule_source

Python reads these via PatternQueriesMixin methods: get_pattern_findings(), get_pattern_rules(), get_pattern_statistics().

Example Questions¶

Find unchecked return values
Find malloc without free
Show functions matching error-handling anti-patterns
Find SQL query construction without parameterization
Find all functions with cyclomatic complexity > 20

Security Audit: Autofix — Autofix pipeline using SSR rules from pattern engine
Security Audit — Vulnerability detection via taint analysis (PatternTaintBridge links to this)
Composite Workflows — Orchestration guide

Structural Pattern Search

Table of Contents¶

Quick Start¶

How It Works¶

Architecture¶

Pattern Types¶

YAML Rules¶

CLI Usage¶

REST API¶

MCP Tools¶

Data Models¶

LLM Rule Generation¶

Security Integration¶

PatternTaintBridge¶

SSR Autofix Bridge¶

DB Tables¶

Example Questions¶

Related¶