Scenario 21: Structural Pattern Search

Scenario 21: Structural Pattern Search

Developer or security engineer finding code matching structural patterns with CPG-aware constraints.

Quick Start

/select pattern_search

Or via CLI:

# Ad-hoc pattern search
python -m src.cli patterns search "malloc($x)" --lang c

# Scan with all rules
python -m src.cli patterns scan

# Scan specific rule
python -m src.cli patterns scan --rule unchecked-return

Overview

Structural Pattern Search uses tree-sitter CST parsing combined with CPG constraints (data flow, call graph, types, domain annotations) to find code matching complex patterns. Unlike regex-based grep, it understands code structure and can match across AST boundaries.

Pattern Types

Syntactic Patterns

Tree-sitter CST patterns with metavariables:

Metavariable Matches
$VAR Any single expression or identifier
$$ARGS Zero or more arguments
$_ Any node (wildcard)
# Find malloc calls
python -m src.cli patterns search "malloc($x)" --lang c

# Find if-return without else
python -m src.cli patterns search "if ($cond) { return $val; }" --lang c

CPG-Constrained Patterns

Patterns with data flow, call graph, type, and domain constraints:

id: unchecked-return
pattern: "$ret = $func($$args)"
language: c
constraints:
  - type: data_flow
    from: "$ret"
    not_reaches: "if ($ret"
  - type: call_graph
    callee: "$func"
    returns: "int"
message: "Return value of $func is not checked"
severity: warning

YAML Rules

Pre-defined rules in configs/rules/ — 190 rules across 14 languages.

# List all available rules
python -m src.cli patterns list

# Show rule statistics
python -m src.cli patterns stats

Example Queries

Find unchecked return values
Find malloc without free
Show functions matching error-handling anti-patterns
Find SQL query construction without parameterization
Find all functions with cyclomatic complexity > 20

Usage

CLI

# Search with pattern
python -m src.cli patterns search "malloc($x)" --lang c --max-results 50

# Scan all rules
python -m src.cli patterns scan --db data/projects/postgres.duckdb

# Scan specific severity
python -m src.cli patterns scan --severity error

# Generate rule from natural language
python -m src.cli patterns generate "find unchecked return values" --lang c --output rule.yaml

# Validate rule
gocpg validate-rule --file rule.yaml

# Autofix (dry run)
python -m src.cli patterns fix --dry-run

# Autofix (apply)
python -m src.cli patterns fix --rule unchecked-return

Programmatic

from src.workflow import MultiScenarioCopilot

copilot = MultiScenarioCopilot()
result = copilot.run("Find unchecked return values", scenario="pattern_search")

for finding in result.get('findings', []):
    print(f"{finding['rule_id']}: {finding['file']}:{finding['line']}")
    print(f"  {finding['message']}")

API

# Search patterns
POST /api/v1/patterns/search
{
  "pattern": "malloc($x)",
  "language": "c",
  "max_results": 50
}

# Get findings for a rule
POST /api/v1/patterns/findings
{
  "rule_id": "unchecked-return"
}

# Generate rule from description
POST /api/v1/patterns/generate
{
  "description": "find unchecked return values",
  "language": "c"
}

MCP

Available as MCP tools: codegraph_pattern_search, codegraph_pattern_findings, codegraph_pattern_stats, codegraph_pattern_fix, codegraph_pattern_generate, codegraph_pattern_test.