Symbolic execution engine for path feasibility checking in taint analysis. Uses the Z3 SMT solver to filter out infeasible taint paths, reducing false positive rate to 1-3%.
Overview¶
When taint analysis finds a path from source to sink, the path may traverse branches with contradictory conditions (e.g., if (x > 10) followed by if (x < 5) on the same variable). Symbolic execution checks whether such paths are actually feasible by encoding branch conditions as Z3 constraints and calling the solver.
Key characteristics: - Optional component — gracefully degrades when Z3 is not installed - Adds ~20-25% time overhead, ~25-30% memory overhead - Maximum 20 constraints per path (configurable) - Recursive descent parser for condition expressions - Supports integer comparisons, boolean logic, arithmetic, function models, NULL checks - Branch direction detection via CFG successor analysis - Configurable solver timeouts (500ms for LIA, 2s for UF queries)
Installation¶
pip install z3-solver
Z3 is already included in requirements.txt. If Z3 is unavailable at runtime, the system logs a warning and returns conservative estimates (all paths assumed feasible).
Architecture¶
TaintPropagator (enable_symbolic_execution=True)
│
├─ find_taint_paths() → raw taint paths
│
└─ PathConstraintTracker.filter_infeasible_paths()
│
├─ _collect_constraints(start, end)
│ ├─ Walk CFG edges, extract branch conditions from CDG
│ └─ _determine_branch_direction() via CFG successors
│
├─ _parse_condition(code, z3_vars, state)
│ └─ Recursive descent: or → and → not → comparison → additive → atom
│ ├─ Variables: z3.Int('x')
│ ├─ Functions: strlen(s) → fresh Int with domain constraints
│ └─ NULL: z3.Bool('ptr_is_null')
│
└─ z3.Solver().check() (with timeout)
├─ sat → path feasible (confidence 0.9)
├─ unsat → path infeasible, FILTER OUT (confidence 0.95)
└─ unknown → conservative, keep path (confidence 0.4)
API Reference¶
PathConstraint¶
@dataclass
class PathConstraint:
node_id: int # CPG node ID of the branch
condition: str # Condition expression (e.g., "x > 100")
location: str # File:line location
negated: bool # True if this is the else-branch
SymbolicExecutionConfig¶
@dataclass
class SymbolicExecutionConfig:
enabled: bool = True
max_constraints: int = 20
solver_timeout_ms: int = 500 # QF_LIA queries
solver_timeout_uf_ms: int = 2000 # Queries with uninterpreted functions
max_parse_depth: int = 10 # Recursive descent depth limit
enable_function_models: bool = True # strlen, sizeof, len models
enable_arithmetic: bool = True # +, -, * in conditions
PathConstraintTracker¶
class PathConstraintTracker:
def __init__(self, cpg_service, max_constraints: int = 20,
config: Optional[SymbolicExecutionConfig] = None):
"""
Args:
cpg_service: CPGQueryService instance for graph queries
max_constraints: Maximum constraints per path (default: 20).
Paths with more constraints are assumed feasible to avoid solver timeout.
config: Optional configuration for solver timeouts and features.
"""
def is_path_feasible(self, start_node: int, end_node: int) -> Tuple[bool, float]:
"""
Check if a path between two nodes is feasible.
Returns:
(is_feasible, confidence) — confidence ranges from 0.0 to 1.0
"""
def filter_infeasible_paths(
self, paths: List[Any], min_confidence: float = 0.8
) -> List[Any]:
"""
Filter a list of taint paths, removing infeasible ones.
Args:
paths: List of taint path objects (must have .path_nodes attribute —
list of node IDs; first element = source, last = sink)
min_confidence: Minimum confidence to consider a result reliable (default: 0.8)
Returns:
Filtered list with only feasible paths
"""
def get_statistics(self) -> Dict[str, Any]:
"""
Get solver statistics.
Returns:
Dictionary with keys:
- z3_available (bool): Whether Z3 solver is installed
- max_constraints (int): Maximum constraints per path
- solver_timeout_ms (int): LIA query timeout
- solver_timeout_uf_ms (int): UF query timeout
- enable_function_models (bool): Function models enabled
- enable_arithmetic (bool): Arithmetic enabled
- supported_functions (list): Known function models
"""
Usage¶
Direct API¶
from src.analysis.dataflow.symbolic_execution import PathConstraintTracker
tracker = PathConstraintTracker(cpg_service, max_constraints=20)
# Check a single path
is_feasible, confidence = tracker.is_path_feasible(node_123, node_456)
if not is_feasible and confidence > 0.9:
print("Path is infeasible — likely false positive")
# Filter a batch of taint paths
feasible_paths = tracker.filter_infeasible_paths(all_paths, min_confidence=0.8)
Integration with Taint Analysis¶
from src.analysis.dataflow.taint_analysis import TaintPropagator
propagator = TaintPropagator(
cpg_service,
enable_symbolic_execution=True # Enables Z3 path filtering
)
# Symbolic execution is applied automatically after taint propagation
paths = propagator.find_taint_paths(
sources=["recv", "getenv"],
sinks=["strcpy", "sprintf"]
)
# paths already filtered — infeasible paths removed
Supported Constraint Types¶
| Constraint | Example | Z3 Encoding |
|---|---|---|
| Integer comparison (var op const) | x > 100 |
z3.Int('x') > 100 |
| Integer comparison (var op var) | a > b |
z3.Int('a') > z3.Int('b') |
| Equality | x == 0 |
z3.Int('x') == 0 |
| Inequality | x != 5 |
z3.Int('x') != 5 |
| Boolean flag | flag |
z3.Int('flag') != 0 |
| Logical AND | x > 0 && y < 10 |
z3.And(x > 0, y < 10) |
| Logical OR | x > 0 \|\| y < 10 |
z3.Or(x > 0, y < 10) |
| Logical NOT | !flag |
z3.Not(flag != 0) |
| Arithmetic | x + 1 > 10 |
z3.Int('x') + 1 > 10 |
| Function model | strlen(s) > 10 |
z3.Int('strlen_s') > 10 (with strlen_s >= 0) |
| NULL check | ptr == NULL |
z3.Bool('ptr_is_null') == True |
| nullptr check | ptr != nullptr |
z3.Bool('ptr_is_null') == False |
Not supported: floating-point comparisons, string operations, interprocedural constraints.
Function Models¶
Known pure functions are modeled as uninterpreted functions with domain constraints:
| Function | Domain Constraint | Languages |
|---|---|---|
strlen, strnlen, wcslen |
result >= 0 |
C/C++ |
sizeof |
result > 0 |
C/C++ |
len, length |
result >= 0 |
Go, Python, Java |
size, count, cap |
result >= 0 |
Go, C++ STL |
abs |
result >= 0 |
All |
Same function on the same argument reuses the same Z3 variable, enabling contradiction detection (e.g., strlen(s) > 10 && strlen(s) < 5 → infeasible).
Confidence Levels¶
| Solver Result | Feasible? | Confidence | Action |
|---|---|---|---|
sat |
Yes | 0.9 | Keep path |
unsat |
No | 0.95 | Remove path (false positive) |
unknown |
Assumed yes | 0.4 | Keep path (conservative) |
| Z3 unavailable | Assumed yes | 0.5 | Keep path (degraded mode) |
| Too many constraints | Assumed yes | 0.3 | Keep path (skip analysis) |
Configuration¶
Via config.yaml:
symbolic_execution:
enabled: true
max_constraints: 20
solver_timeout_ms: 500
solver_timeout_uf_ms: 2000
max_parse_depth: 10
enable_function_models: true
enable_arithmetic: true
Performance Impact¶
| Metric | Without Z3 | With Z3 | Delta |
|---|---|---|---|
| Analysis time | Baseline | +20-25% | Solver overhead |
| Memory | Baseline | +25-30% | Z3 internal structures |
| False positive rate | ~12% | ~1-3% | Infeasible paths removed |
Limitations¶
- Lightweight only — not a full symbolic executor; only tracks branch conditions
- No floating-point — no
float/doubleconstraint support (100-1000x Z3 slowdown) - No string operations — no
strcmp, string equality (50-500x Z3 slowdown) - Max 20 constraints — longer paths assumed feasible to avoid timeout
- No interprocedural — constraints collected only within a single function’s CFG
- Linear arithmetic only —
var * constis supported,var * varis not
Related Documentation¶
- Dataflow Analysis Guide — taint analysis integration
- Analysis Modules — CFG, DDG, CDG analysis modules
- Security Reference — security analysis overview
- Hypothesis System — hypothesis-driven security validation
Module: src/analysis/dataflow/symbolic_execution.py
Last updated: March 2026