Symbolic Execution (Z3 SMT Solver)¶
Lightweight symbolic execution engine for path feasibility checking in taint analysis. Uses the Z3 SMT solver to filter out infeasible taint paths, reducing false positive rate by 5-7%.
Overview¶
When taint analysis finds a path from source to sink, the path may traverse branches with contradictory conditions (e.g., if (x > 10) followed by if (x < 5) on the same variable). Symbolic execution checks whether such paths are actually feasible by encoding branch conditions as Z3 constraints and calling the solver.
Key characteristics: - Optional component — gracefully degrades when Z3 is not installed - Adds ~20% time overhead, ~25% memory overhead - Maximum 20 constraints per path (configurable) - Supports integer comparisons, NULL checks, simple arithmetic
Installation¶
pip install z3-solver
Z3 is already included in requirements.txt. If Z3 is unavailable at runtime, the system logs a warning and returns conservative estimates (all paths assumed feasible).
Architecture¶
TaintPropagator (enable_symbolic_execution=True)
│
├─ find_taint_paths() → raw taint paths
│
└─ PathConstraintTracker.filter_infeasible_paths()
│
├─ _collect_constraints(start, end)
│ └─ Walk CFG edges, extract branch conditions from CDG
│
├─ _parse_constraint_to_z3(constraint, z3_vars)
│ └─ Convert "x > 100" → z3.Int('x') > 100
│
└─ z3.Solver().check()
├─ sat → path feasible (confidence 0.9)
├─ unsat → path infeasible, FILTER OUT (confidence 0.95)
└─ unknown → conservative, keep path (confidence 0.4)
API Reference¶
PathConstraint¶
@dataclass
class PathConstraint:
node_id: int # CPG node ID of the branch
condition: str # Condition expression (e.g., "x > 100")
location: str # File:line location
negated: bool # True if this is the else-branch
PathConstraintTracker¶
class PathConstraintTracker:
def __init__(self, cpg_service, max_constraints: int = 20):
"""
Args:
cpg_service: CPGQueryService instance for graph queries
max_constraints: Maximum constraints per path (default: 20).
Paths with more constraints are assumed feasible to avoid solver timeout.
"""
def is_path_feasible(self, start_node: int, end_node: int) -> Tuple[bool, float]:
"""
Check if a path between two nodes is feasible.
Returns:
(is_feasible, confidence) — confidence ranges from 0.0 to 1.0
"""
def filter_infeasible_paths(
self, paths: List[Any], min_confidence: float = 0.8
) -> List[Any]:
"""
Filter a list of taint paths, removing infeasible ones.
Args:
paths: List of taint path objects (must have .source_id and .sink_id)
min_confidence: Minimum confidence to consider a result reliable (default: 0.8)
Returns:
Filtered list with only feasible paths
"""
Usage¶
Direct API¶
from src.analysis.dataflow.symbolic_execution import PathConstraintTracker
tracker = PathConstraintTracker(cpg_service, max_constraints=20)
# Check a single path
is_feasible, confidence = tracker.is_path_feasible(node_123, node_456)
if not is_feasible and confidence > 0.9:
print("Path is infeasible — likely false positive")
# Filter a batch of taint paths
feasible_paths = tracker.filter_infeasible_paths(all_paths, min_confidence=0.8)
Integration with Taint Analysis¶
from src.analysis.dataflow.taint import TaintPropagator
propagator = TaintPropagator(
cpg_service,
enable_symbolic_execution=True # Enables Z3 path filtering
)
# Symbolic execution is applied automatically after taint propagation
paths = propagator.find_taint_paths(
sources=["recv", "getenv"],
sinks=["strcpy", "sprintf"]
)
# paths already filtered — infeasible paths removed
Supported Constraint Types¶
| Constraint | Example | Z3 Encoding |
|---|---|---|
| Integer comparison | x > 100 |
z3.Int('x') > 100 |
| Equality | x == 0 |
z3.Int('x') == 0 |
| Inequality | x != 5 |
z3.Int('x') != 5 |
| NULL check | ptr == NULL |
z3.Int('ptr') == 0 |
| nullptr check | ptr != nullptr |
z3.Int('ptr') != 0 |
| Simple arithmetic | x + y > z |
z3.Int('x') + z3.Int('y') > z3.Int('z') |
Not supported: function calls in conditions, complex expressions, floating-point, string operations.
Confidence Levels¶
| Solver Result | Feasible? | Confidence | Action |
|---|---|---|---|
sat |
Yes | 0.9 | Keep path |
unsat |
No | 0.95 | Remove path (false positive) |
unknown |
Assumed yes | 0.4 | Keep path (conservative) |
| Z3 unavailable | Assumed yes | 0.5 | Keep path (degraded mode) |
| Too many constraints | Assumed yes | 0.5 | Keep path (skip analysis) |
Performance Impact¶
| Metric | Without Z3 | With Z3 | Delta |
|---|---|---|---|
| Analysis time | Baseline | +20% | Solver overhead |
| Memory | Baseline | +25% | Z3 internal structures |
| False positive rate | ~12% | ~5-8% | Infeasible paths removed |
Limitations¶
- Lightweight only — not a full symbolic executor; only tracks branch conditions
- Integer-only — no floating-point or string constraint support
- Max 20 constraints — longer paths assumed feasible to avoid timeout
- No interprocedural — constraints collected only within a single function’s CFG
- Pattern-based parsing — condition expressions parsed via regex, not full AST
Related Documentation¶
- Dataflow Analysis Guide — taint analysis integration
- Analysis Modules — CFG, DDG, CDG analysis modules
- Security Reference — security analysis overview
- Hypothesis System — hypothesis-driven security validation
Module: src/analysis/dataflow/symbolic_execution.py
Last updated: February 2026