Symbolic Execution (Z3 SMT Solver)

Symbolic execution engine for path feasibility checking in taint analysis. Uses the Z3 SMT solver to filter out infeasible taint paths, reducing false positive rate to 1-3%.

Overview¶

When taint analysis finds a path from source to sink, the path may traverse branches with contradictory conditions (e.g., if (x > 10) followed by if (x < 5) on the same variable). Symbolic execution checks whether such paths are actually feasible by encoding branch conditions as Z3 constraints and calling the solver.

Key characteristics: - Optional component — gracefully degrades when Z3 is not installed - Adds ~20-25% time overhead, ~25-30% memory overhead - Maximum 20 constraints per path (configurable) - Recursive descent parser for condition expressions - Supports integer comparisons, boolean logic, arithmetic, function models, NULL checks - Branch direction detection via CFG successor analysis - Configurable solver timeouts (500ms for LIA, 2s for UF queries)

Installation¶

pip install z3-solver

Z3 is already included in requirements.txt. If Z3 is unavailable at runtime, the system logs a warning and returns conservative estimates (all paths assumed feasible).

Architecture¶

TaintPropagator (enable_symbolic_execution=True)
    │
    ├─ find_taint_paths()  →  raw taint paths
    │
    └─ PathConstraintTracker.filter_infeasible_paths()
        │
        ├─ _collect_constraints(start, end)
        │   ├─ Walk CFG edges, extract branch conditions from CDG
        │   └─ _determine_branch_direction() via CFG successors
        │
        ├─ _parse_condition(code, z3_vars, state)
        │   └─ Recursive descent: or → and → not → comparison → additive → atom
        │       ├─ Variables: z3.Int('x')
        │       ├─ Functions: strlen(s) → fresh Int with domain constraints
        │       └─ NULL: z3.Bool('ptr_is_null')
        │
        └─ z3.Solver().check()  (with timeout)
            ├─ sat     → path feasible (confidence 0.9)
            ├─ unsat   → path infeasible, FILTER OUT (confidence 0.95)
            └─ unknown → conservative, keep path (confidence 0.4)

API Reference¶

PathConstraint¶

@dataclass
class PathConstraint:
    node_id: int       # CPG node ID of the branch
    condition: str     # Condition expression (e.g., "x > 100")
    location: str      # File:line location
    negated: bool      # True if this is the else-branch

SymbolicExecutionConfig¶

@dataclass
class SymbolicExecutionConfig:
    enabled: bool = True
    max_constraints: int = 20
    solver_timeout_ms: int = 500        # QF_LIA queries
    solver_timeout_uf_ms: int = 2000    # Queries with uninterpreted functions
    max_parse_depth: int = 10           # Recursive descent depth limit
    enable_function_models: bool = True # strlen, sizeof, len models
    enable_arithmetic: bool = True      # +, -, * in conditions

PathConstraintTracker¶

class PathConstraintTracker:
    def __init__(self, cpg_service, max_constraints: int = 20,
                 config: Optional[SymbolicExecutionConfig] = None):
        """
        Args:
            cpg_service: CPGQueryService instance for graph queries
            max_constraints: Maximum constraints per path (default: 20).
                Paths with more constraints are assumed feasible to avoid solver timeout.
            config: Optional configuration for solver timeouts and features.
        """

    def is_path_feasible(self, start_node: int, end_node: int) -> Tuple[bool, float]:
        """
        Check if a path between two nodes is feasible.

        Returns:
            (is_feasible, confidence) — confidence ranges from 0.0 to 1.0
        """

    def filter_infeasible_paths(
        self, paths: List[Any], min_confidence: float = 0.8
    ) -> List[Any]:
        """
        Filter a list of taint paths, removing infeasible ones.

        Args:
            paths: List of taint path objects (must have .path_nodes attribute —
                   list of node IDs; first element = source, last = sink)
            min_confidence: Minimum confidence to consider a result reliable (default: 0.8)

        Returns:
            Filtered list with only feasible paths
        """

    def get_statistics(self) -> Dict[str, Any]:
        """
        Get solver statistics.

        Returns:
            Dictionary with keys:
            - z3_available (bool): Whether Z3 solver is installed
            - max_constraints (int): Maximum constraints per path
            - solver_timeout_ms (int): LIA query timeout
            - solver_timeout_uf_ms (int): UF query timeout
            - enable_function_models (bool): Function models enabled
            - enable_arithmetic (bool): Arithmetic enabled
            - supported_functions (list): Known function models
        """

Usage¶

Direct API¶

from src.analysis.dataflow.symbolic_execution import PathConstraintTracker

tracker = PathConstraintTracker(cpg_service, max_constraints=20)

# Check a single path
is_feasible, confidence = tracker.is_path_feasible(node_123, node_456)
if not is_feasible and confidence > 0.9:
    print("Path is infeasible — likely false positive")

# Filter a batch of taint paths
feasible_paths = tracker.filter_infeasible_paths(all_paths, min_confidence=0.8)

Integration with Taint Analysis¶

from src.analysis.dataflow.taint_analysis import TaintPropagator

propagator = TaintPropagator(
    cpg_service,
    enable_symbolic_execution=True  # Enables Z3 path filtering
)

# Symbolic execution is applied automatically after taint propagation
paths = propagator.find_taint_paths(
    sources=["recv", "getenv"],
    sinks=["strcpy", "sprintf"]
)
# paths already filtered — infeasible paths removed

Supported Constraint Types¶

Constraint	Example	Z3 Encoding
Integer comparison (var op const)	`x > 100`	`z3.Int('x') > 100`
Integer comparison (var op var)	`a > b`	`z3.Int('a') > z3.Int('b')`
Equality	`x == 0`	`z3.Int('x') == 0`
Inequality	`x != 5`	`z3.Int('x') != 5`
Boolean flag	`flag`	`z3.Int('flag') != 0`
Logical AND	`x > 0 && y < 10`	`z3.And(x > 0, y < 10)`
Logical OR	`x > 0 \\|\\| y < 10`	`z3.Or(x > 0, y < 10)`
Logical NOT	`!flag`	`z3.Not(flag != 0)`
Arithmetic	`x + 1 > 10`	`z3.Int('x') + 1 > 10`
Function model	`strlen(s) > 10`	`z3.Int('strlen_s') > 10` (with `strlen_s >= 0`)
NULL check	`ptr == NULL`	`z3.Bool('ptr_is_null') == True`
nullptr check	`ptr != nullptr`	`z3.Bool('ptr_is_null') == False`

Not supported: floating-point comparisons, string operations, interprocedural constraints.

Function Models¶

Known pure functions are modeled as uninterpreted functions with domain constraints:

Function	Domain Constraint	Languages
`strlen`, `strnlen`, `wcslen`	`result >= 0`	C/C++
`sizeof`	`result > 0`	C/C++
`len`, `length`	`result >= 0`	Go, Python, Java
`size`, `count`, `cap`	`result >= 0`	Go, C++ STL
`abs`	`result >= 0`	All

Same function on the same argument reuses the same Z3 variable, enabling contradiction detection (e.g., strlen(s) > 10 && strlen(s) < 5 → infeasible).

Confidence Levels¶

Solver Result	Feasible?	Confidence	Action
`sat`	Yes	0.9	Keep path
`unsat`	No	0.95	Remove path (false positive)
`unknown`	Assumed yes	0.4	Keep path (conservative)
Z3 unavailable	Assumed yes	0.5	Keep path (degraded mode)
Too many constraints	Assumed yes	0.3	Keep path (skip analysis)

Configuration¶

Via config.yaml:

symbolic_execution:
  enabled: true
  max_constraints: 20
  solver_timeout_ms: 500
  solver_timeout_uf_ms: 2000
  max_parse_depth: 10
  enable_function_models: true
  enable_arithmetic: true

Performance Impact¶

Metric	Without Z3	With Z3	Delta
Analysis time	Baseline	+20-25%	Solver overhead
Memory	Baseline	+25-30%	Z3 internal structures
False positive rate	~12%	~1-3%	Infeasible paths removed

Limitations¶

Lightweight only — not a full symbolic executor; only tracks branch conditions
No floating-point — no float/double constraint support (100-1000x Z3 slowdown)
No string operations — no strcmp, string equality (50-500x Z3 slowdown)
Max 20 constraints — longer paths assumed feasible to avoid timeout
No interprocedural — constraints collected only within a single function’s CFG
Linear arithmetic only — var * const is supported, var * var is not

Dataflow Analysis Guide — taint analysis integration
Analysis Modules — CFG, DDG, CDG analysis modules
Security Reference — security analysis overview
Hypothesis System — hypothesis-driven security validation

Module: src/analysis/dataflow/symbolic_execution.py Last updated: March 2026