Security Hypothesis Validation¶

Technical Document for Security Architects and Researchers

Table of Contents¶

Abstract
1. The Problem
1.1 Limitations of Traditional SAST
1.2 Why Pattern Matching Is Not Enough
2. Solution Architecture
2.1 Hypothesis Validation Pipeline
3. Multi-Criteria Scoring Model
3.1 Prioritization Formula
3.2 Scoring Components
3.3 Bonus Multipliers
4. Codebase Statistics
4.1 Statistics Collection from CPG
4.2 Tracked Functions
5. Taint Analysis on CPG
5.1 Data Flow Verification
5.2 Sanitization Check
6. Validation Results
6.1 Benchmark on PostgreSQL 17
6.2 Detected CVEs
6.3 Comparison with Traditional SAST
7. Hypothesis Structure
7.1 SecurityHypothesis
7.2 Hypothesis Format
8. Integration API
8.1 Full Example
9. Conclusion
Related Documents

Abstract¶

Traditional SAST (Static Application Security Testing) tools suffer from high false positive rates (up to 70-90%), making analysis results practically unusable for real work. CodeGraph solves this problem with a multi-criteria hypothesis validation system that:

Generates testable hypotheses based on CWE/CAPEC knowledge bases
Evaluates hypotheses across three criteria considering codebase context
Verifies vulnerabilities through taint analysis on Code Property Graph
Achieves 100% CVE detection rate while reducing the false positive rate from 70-90% to under 30% (60%+ reduction)

1. The Problem¶

1.1 Limitations of Traditional SAST¶

Traditional SAST:
  Pattern: "strcpy" found
  Result: POSSIBLE vulnerability
  False Positive Rate: 70-90%

CodeGraph:
  Hypothesis: Untrusted data flows from recv() to strcpy()
  Evidence: Taint path verified via CPG
  Result: CONFIRMED vulnerability
  False Positive Rate: <30%

1.2 Why Pattern Matching Is Not Enough¶

Problem	Description
No context	`strcpy` is safe if source is a constant
No data flow	Doesn’t consider where data comes from
No sanitization	Ignores validator functions
No prioritization	All findings have equal weight

2. Solution Architecture¶

2.1 Hypothesis Validation Pipeline¶

┌─────────────────────────────────────────────────────────────────────────┐
│                    HYPOTHESIS VALIDATION PIPELINE                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │ 1. GENERATION                                                   │   │
│  │    HypothesisGenerator.generate()                               │   │
│  │    ├── CWE Database (58 entries)                               │   │
│  │    ├── CAPEC Database (27 attack patterns)                    │   │
│  │    ├── Language Patterns (C, Python, Java)                     │   │
│  │    └── Cartesian Product: CWEs × CAPECs × Patterns             │   │
│  │    Output: SecurityHypothesis[]                                │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                              │                                          │
│                              ▼                                          │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │ 2. MULTI-CRITERIA SCORING                                       │   │
│  │    MultiCriteriaScorer.score_batch()                           │   │
│  │                                                                  │   │
│  │    Score = CWE_Freq × 0.40 + Attack_Sim × 0.30 + Exposure × 0.30│   │
│  │                                                                  │   │
│  │    Bonuses:                                                     │   │
│  │    ├── Known CVE pattern: ×1.20                                │   │
│  │    ├── Critical severity: ×1.10                                │   │
│  │    └── Recent exploit: ×1.15                                   │   │
│  │                                                                  │   │
│  │    Output: Prioritized hypotheses                               │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                              │                                          │
│                              ▼                                          │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │ 3. QUERY SYNTHESIS                                              │   │
│  │    QuerySynthesizer.synthesize()                                │   │
│  │    ├── Match hypothesis to SQL template                        │   │
│  │    ├── Parameter substitution                                  │   │
│  │    └── Output: DuckDB SQL / PGQ queries                        │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                              │                                          │
│                              ▼                                          │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │ 4. EXECUTION                                                    │   │
│  │    HypothesisExecutor.execute()                                 │   │
│  │    ├── Run queries against CPG                                 │   │
│  │    ├── Collect evidence                                        │   │
│  │    └── Output: Evidence[]                                      │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                              │                                          │
│                              ▼                                          │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │ 5. VALIDATION                                                   │   │
│  │    HypothesisValidator.validate()                               │   │
│  │    ├── Analyze evidence                                        │   │
│  │    ├── Update hypothesis status (CONFIRMED/REJECTED)           │   │
│  │    └── Calculate precision/recall metrics                      │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

3. Multi-Criteria Scoring Model¶

3.1 Prioritization Formula¶

Priority Score = (CWE_Frequency × 0.40)
               + (Attack_Similarity × 0.30)
               + (Codebase_Exposure × 0.30)

3.2 Scoring Components¶

CWE Frequency Score (40%)¶

Evaluates how often this vulnerability appears in real CVEs.

def _score_cwe_frequency(cwe_ids: List[str]) -> float:
    """
    Components:
    - prevalence: Frequency in CVE database (0.0-1.0)
    - exploitability: How easy to exploit (0.0-1.0)
    - cvss_base: CVSS base score / 10 (0.0-1.0)

    Score = prevalence × 0.4 + exploitability × 0.4 + cvss × 0.2
    """

CWE	Prevalence	Exploitability	CVSS	Score
CWE-120 (Buffer Overflow)	0.85	0.90	8.0	0.86
CWE-78 (Command Injection)	0.75	0.95	9.8	0.88
CWE-89 (SQL Injection)	0.90	0.95	9.8	0.94
CWE-200 (Info Disclosure)	0.60	0.70	5.3	0.63

Attack Similarity Score (30%)¶

Evaluates how well the hypothesis matches known attack patterns from CAPEC.

def _score_attack_similarity(capec_ids: List[str]) -> float:
    """
    Components:
    - likelihood: Attack probability (0.0-1.0)
    - skill_level: Required skill level
      - Low: ×1.0 (higher risk)
      - Medium: ×0.8
      - High: ×0.6
      - Expert: ×0.4 (lower risk)

    Score = likelihood × skill_adjustment
    """

Codebase Exposure Score (30%)¶

Evaluates how exposed the specific codebase is to this vulnerability.

def _score_codebase_exposure(hypothesis) -> float:
    """
    Components:
    - sink_exposure: Presence of dangerous sink functions
    - source_exposure: Presence of external data sources
    - sanitizer_coverage: Presence of sanitizer functions (lowers risk)
    - taint_paths: Number of source → sink paths

    Exposure = (sink × 0.4 + source × 0.4) × (1 - sanitizer × 0.5)
    """

3.3 Bonus Multipliers¶

Bonus	Multiplier	Condition
Known CVE	×1.20	Pattern matches known CVE
Critical Severity	×1.10	CWE has critical severity
Recent Exploit	×1.15	Recent exploitation in wild

4. Codebase Statistics¶

4.1 Statistics Collection from CPG¶

@dataclass
class CodebaseStats:
    total_methods: int        # Total method count
    total_calls: int          # Total call count

    sink_counts: Dict[str, int]       # sink_name → count
    source_counts: Dict[str, int]     # source_name → count
    sanitizer_counts: Dict[str, int]  # sanitizer_name → count

    taint_paths: int          # Number of source→sink paths

4.2 Tracked Functions¶

Dangerous Sinks (C):

strcpy, strcat, sprintf, gets, memcpy
system, popen, execl, execv
printf, fprintf (format string)
appendPQExpBuffer, SPI_execute, PQexec (PostgreSQL)

Untrusted Sources:

recv, read, fgets, getenv
PQgetvalue, SPI_getvalue, getTables (PostgreSQL)

Sanitizers:

strlcpy, snprintf
fmtId, quote_identifier, quote_literal
pg_class_aclcheck

5. Taint Analysis on CPG¶

5.1 Data Flow Verification¶

-- Find unvalidated paths from source to sink
FROM GRAPH_TABLE(cpg
    MATCH (src:CALL)-[:REACHING_DEF*1..10]->(sink:CALL)
    WHERE src.name IN ('recv', 'getenv', 'PQgetvalue')
      AND sink.name IN ('strcpy', 'sprintf', 'system')
    COLUMNS (
        src.name AS source,
        sink.name AS sink,
        sink.filename,
        sink.line_number
    )
)

5.2 Sanitization Check¶

-- Check for sanitizers on the path
SELECT h.id, h.source, h.sink,
       EXISTS (
           SELECT 1 FROM nodes_call nc
           WHERE nc.name IN ('strlcpy', 'snprintf', 'quote_identifier')
             AND nc.line_number BETWEEN h.source_line AND h.sink_line
             AND nc.filename = h.filename
       ) AS has_sanitizer
FROM hypothesis_paths h;

6. Validation Results¶

6.1 Benchmark on PostgreSQL 17¶

Metric	Value
CVE Detection Rate	100% (3/3)
Hypothesis Confirmation Rate	55%
Average Query Time	2-3 ms
Generation Time (100 hyp.)	<1 sec
Execution Time (20 hyp.)	<30 sec

6.2 Detected CVEs¶

CVE ID	Type	Detection Method
CVE-2025-8713	Statistics Disclosure	Hypothesis + Taint
CVE-2025-8714	pg_dump Injection	Method-based
CVE-2025-8715	Newline Injection	Method-based

6.3 Comparison with Traditional SAST¶

Tool	True Positives	False Positives	Precision
Pattern SAST	3	45	6.25%
CodeGraph	3	2	60%
TaintVerifiedScanner	3	0.4	88% (12% FP rate)

7. Hypothesis Structure¶

7.1 SecurityHypothesis¶

@dataclass
class SecurityHypothesis:
    id: str                          # Unique identifier
    hypothesis_text: str             # Hypothesis text

    # Classification
    cwe_ids: List[str]              # ["CWE-120", "CWE-119"]
    capec_ids: List[str]            # ["CAPEC-100"]
    language: str                   # "C", "Python"
    category: str                   # "buffer_overflow"

    # Taint patterns
    source_patterns: List[str]      # ["PQgetvalue", "getenv"]
    sink_patterns: List[str]        # ["strcpy", "memcpy"]
    sanitizer_patterns: List[str]   # ["strlcpy", "sizeof"]

    # Scoring
    priority_score: float           # 0.0-1.0+
    confidence: float               # 0.0-1.0

    # Multi-criteria breakdown
    cwe_frequency_score: float
    attack_similarity_score: float
    codebase_exposure_score: float

    # Validation
    sql_query: Optional[str]
    evidence: List[Evidence]
    validation_status: ValidationStatus

7.2 Hypothesis Format¶

"If untrusted data from {sources} flows to {sinks}
 without sanitization via {sanitizers},
 then {cwe_id} enables {capec_id} attack,
 potentially allowing {impact}."

Example:

"If untrusted data from PQgetvalue() flows to strcpy()
 without bounds checking via strlcpy(),
 then CWE-120 enables CAPEC-100 (Buffer Overflow) attack,
 potentially allowing memory corruption or code execution."

8. Integration API¶

8.1 Full Example¶

from src.security.hypothesis import (
    HypothesisGenerator,
    MultiCriteriaScorer,
    QuerySynthesizer,
    HypothesisExecutor,
    HypothesisValidator,
    CodebaseStats,
    compute_codebase_stats_from_duckdb
)
import duckdb
from src.project_manager import ProjectManager

# 1. Connect to CPG
conn = duckdb.connect(ProjectManager.get_active_db_path())

# 2. Gather codebase statistics
stats = compute_codebase_stats_from_duckdb(ProjectManager.get_active_db_path())

# 3. Generate hypotheses
generator = HypothesisGenerator()
hypotheses = generator.generate(
    language="C",
    cwe_filter=["CWE-120", "CWE-78", "CWE-89"],
    max_hypotheses=100
)

# 4. Multi-criteria scoring
scorer = MultiCriteriaScorer(codebase_stats=stats)
scored = scorer.score_batch(hypotheses)

# 5. Synthesize SQL queries
synthesizer = QuerySynthesizer()
for h in scored:
    h.sql_query = synthesizer.synthesize_query(h)

# 6. Execute on CPG
executor = HypothesisExecutor(conn)
for h in scored[:20]:  # Top 20
    evidence = executor.execute(h)
    h.evidence.extend(evidence)

# 7. Validate and report
validator = HypothesisValidator()
results = validator.validate_batch(scored)

print(f"Detection Rate: {results.detection_rate:.1%}")
print(f"Precision: {results.precision:.1%}")
print(f"F1 Score: {results.f1_score:.2f}")

9. Conclusion¶

CodeGraph’s multi-criteria hypothesis validation system represents a fundamentally new approach to vulnerability detection:

Contextual analysis — considers specifics of the particular codebase
Taint verification — confirms data flow through CPG
Risk prioritization — focus on actually exploitable vulnerabilities
False positive reduction — from 70-90% down to less than 30%

Result: 100% detection rate for target CVEs with dramatic reduction in false positives.

Taint Visualization and SARIF Integration¶

Confirmed hypotheses can be exported with full taint path visualization: - Mermaid flowcharts (src/security/taint_visualizer.py) render source-to-sink data flows as interactive diagrams - SARIF 2.1.0 export (src/security/sarif_exporter.py) includes codeFlows with step-by-step taint propagation - OWASP Top 10 mapping (src/security/owasp_mapping.py) classifies findings for compliance reporting

Path Feasibility with z3¶

The z3 symbolic execution engine validates path constraints for confirmed hypotheses, eliminating infeasible paths and further reducing false positives. This is particularly effective for conditional vulnerabilities where the exploit depends on specific input ranges or configuration.

Version: 1.1 | February 2026

Security Hypothesis Validation

Security Hypothesis Validation¶

Table of Contents¶

Abstract¶

1. The Problem¶

1.1 Limitations of Traditional SAST¶

1.2 Why Pattern Matching Is Not Enough¶

2. Solution Architecture¶

2.1 Hypothesis Validation Pipeline¶

3. Multi-Criteria Scoring Model¶

3.1 Prioritization Formula¶

3.2 Scoring Components¶

CWE Frequency Score (40%)¶

Attack Similarity Score (30%)¶

Codebase Exposure Score (30%)¶

3.3 Bonus Multipliers¶

4. Codebase Statistics¶

4.1 Statistics Collection from CPG¶

4.2 Tracked Functions¶

5. Taint Analysis on CPG¶

5.1 Data Flow Verification¶

5.2 Sanitization Check¶

6. Validation Results¶

6.1 Benchmark on PostgreSQL 17¶

6.2 Detected CVEs¶

6.3 Comparison with Traditional SAST¶

7. Hypothesis Structure¶

7.1 SecurityHypothesis¶

7.2 Hypothesis Format¶

8. Integration API¶

8.1 Full Example¶

9. Conclusion¶

Taint Visualization and SARIF Integration¶

Path Feasibility with z3¶

Related Documents¶