Scenario 07: Test Coverage

CPG-based untested code detection, test prioritization, test generation recommendations, runtime coverage import, and hybrid analysis.

Table of Contents

Quick Start

/select 07

How It Works

Architecture

The test coverage module (src/workflow/scenarios/coverage_handlers/, 13 files) consists of 6 components:

User Query
    |
    v
CoverageIntentDetector (5 intent types, bilingual)
    |
    v
HandlerRegistry (priority-ordered dispatch)
    |
    +---> TestGenerationHandler (priority 5)
    |         generate test recommendations for specific functions
    |
    +---> UntestedCodeHandler (priority 10)
    |         find untested code via heuristic or hybrid detection
    |
    +---> TestPriorityHandler (priority 20)
    |         rank untested functions by criticality
    |
    v
CoverageReportFormatter (bilingual markdown output)
    |
    v
CallGraphAnalyzer ──> ImpactAnalysis (impact_score, callers, callees)
Component Module Purpose
CoverageIntentDetector intent_detector.py Detect query intent among 5 types with morphological matching
TestGenerationHandler handlers/test_generator.py Generate test recommendations for named functions
UntestedCodeHandler handlers/untested.py Find untested code (heuristic + hybrid runtime detection)
TestPriorityHandler handlers/priority.py Rank untested functions by 4 scoring factors
CoverageReportFormatter formatters/coverage_report.py Format reports with FormatterLocalization (EN/RU)
TestCoverageScanner handlers/coverage_scanner.py Interface coverage scanning (dual scan: edges_call + file path heuristic)

Intent Detection

CoverageIntentDetector classifies queries into 5 intent types using morphological keyword matching with Cyrillic word boundaries:

Intent Type Priority EN Keywords RU Keywords
test_generation 5 generate test, create test, write test, test suite, mutation test, stress test, property-based, chaos engineering сгенерировать тест, создать тест, написать тест, модульный тест, стресс-тесты, мутационное тестирование
untested_code_scan 10 untested, no test, missing test, not covered, uncovered непротестированный, без теста, нет тестов, непокрытый
test_priority 20 test priority, should test, critical test, high priority test приоритет теста, протестировать, критический тест, важный тест
coverage_gap 30 coverage gap, low coverage, coverage report, test coverage пробел покрытия, низкое покрытие, отчет покрытия, покрытие тестами
coverage_improvement 40 improve coverage, increase coverage, better coverage улучшить покрытие, увеличить покрытие, повысить покрытие

Additional extraction: - _extract_criticality(query) — returns "critical", "high", "medium", or "all" - _extract_scope(query) — returns "method", "class", "module", or "all"

Handler Registry

Handlers are registered via HandlerRegistry("coverage") with priority-ordered dispatch. Lower priority = higher precedence:

@coverage_registry.register(priority=5)
class TestGenerationHandlerRegistered(TestGenerationHandler): ...

@coverage_registry.register(priority=10)
class UntestedCodeHandlerRegistered(UntestedCodeHandler): ...

@coverage_registry.register(priority=20)
class TestPriorityHandlerRegistered(TestPriorityHandler): ...

Each handler implements can_handle(query_info) -> bool and handle(query_info) -> HandlerResult. The registry tries handlers in priority order and uses the first one that matches.

Detection Modes

Heuristic Detection

Default mode when no runtime coverage data is available. Methods without test_* callers in edges_call are flagged as untested:

-- Methods with no test callers
SELECT m.id, m.name, m.full_name
FROM nodes_method m
WHERE NOT EXISTS (
    SELECT 1 FROM edges_call ec
    JOIN nodes_method caller ON ec.source_id = caller.id
    WHERE ec.target_id = m.id
    AND caller.name LIKE 'test_%'
)

Hybrid Detection

Automatically activated when coverage_percent column exists in nodes_method (populated via coverage import):

  • Methods with coverage_percent < 1.0 → flagged via runtime data
  • Methods with NULL coverage_percent → fallback to heuristic test-caller analysis
  • Each candidate tagged with detection_method: "runtime" or "heuristic"
  • Coverage estimate uses AVG(coverage_percent) from runtime data

The _has_coverage_data() guard ensures zero behavior change when no data is imported.

Handlers

UntestedCodeHandler

Handles untested_code_scan intent. Key methods:

Method Description
can_handle(query_info) Returns True when type == "untested_code_scan"
handle(query_info) Finds untested code, classifies by criticality, generates recommendations
_find_untested_functions() Heuristic + optional runtime coverage detection
_classify_by_criticality(candidates) Groups by risk level (critical/high/medium/low)
_estimate_coverage(candidates) Calculates coverage percentage
_has_coverage_data() Checks for coverage_percent column

Enriches top 20 candidates with CPG-based test recommendations (branch coverage, parameter boundaries, error paths).

TestGenerationHandler

Handles test_generation intent. Key methods:

Method Description
can_handle(query_info) Returns True when type == "test_generation" and function names extractable
handle(query_info) Generates test recommendations for specific functions
_extract_function_names(query) Extracts function names via priority patterns
_search_functions_by_keywords(query) Concept-based function search (e.g., “tests for query execution”)
_get_function_info(func_name) Case-insensitive function lookup
_get_function_callees(func_name) Dependencies to mock
_get_function_callers(func_name) Test scenarios from callers
_suggest_test_approach(...) Generates unit/integration/edge-case strategy

TestPriorityHandler

Handles test_priority intent. Ranks untested functions using 4 scoring factors:

Factor Score Condition
Module criticality +3.0 api, interface, core modules
Module criticality +2.0 main, engine, system modules
Complexity +1.5 Signature length > 200 chars
Complexity +1.0 Signature length > 100 chars
Public API +1.0 No underscore prefix in name
Caller count +2.0 Above high contention threshold
Caller count +1.0 Above medium caller threshold

Score is converted to priority level via _score_to_rating(): high (≥ threshold), medium, low.

Impact Analysis

The workflow uses CallGraphAnalyzer (Graph Method #2) from src/analysis/callgraph/analyzer.py for impact analysis on untested methods.

ImpactAnalysis dataclass:

Field Type Description
method_name str Analyzed method
direct_callers list[str] Methods calling this directly
transitive_callers list[str] All transitive callers
direct_callees list[str] Methods called directly
transitive_callees list[str] All transitive callees
impact_score float 0.0–1.0 impact score

3 graph insight categories tracked in state["metadata"]:

Insight Condition
high_impact_untested impact_score > thresholds.high_impact
untested_entry_points Many callers + few callees
critical_untested callers > min_callers && impact_score > impact_score_medium

CPG-Based Recommendations

For each untested method (top 20 by criticality), the handler generates specific recommendations:

Branch Coverage Analysis

Counts control structures (IF, FOR, WHILE, SWITCH) from nodes_control_structure and estimates test cases needed.

Parameter Boundary Analysis

Maps parameter types from nodes_param to boundary test suggestions:

Type Boundary Tests
int, long, size_t, float zero, negative, max, min
char*, string, str empty, null, very long, special chars
Pointer types (*, ptr) null pointer
bool true, false
Variadic parameters zero args, one arg, many args

Error Path Analysis

Counts TRY blocks in nodes_control_structure and RETURN statements in nodes_return. Multiple return statements suggest error handling paths.

Runtime Coverage Import

Import coverage data from external tools to enable hybrid detection.

Supported Formats

Format Tool File Type
pytest-cov pytest-cov (--cov-report=json) JSON
lcov gcov / lcov / geninfo Text (.info, .lcov)
cobertura Cobertura, JaCoCo, coverage.py XML XML

How It Works

  1. The parser reads the coverage report and extracts per-file line-level hit data
  2. The importer adds a coverage_percent column to nodes_method (if absent)
  3. Each method is matched to coverage data by suffix-matching the filename and intersecting the method’s line range with covered lines
  4. coverage_percent is computed as covered_lines_in_range / total_lines_in_range * 100

Path normalization: Coverage reports often contain absolute paths while the CPG stores relative paths. The importer normalizes paths (strips ./, converts backslashes) and falls back to suffix matching. Use --source-root to strip a common prefix.

Configuration

Coverage-related parameters from get_unified_config():

Thresholds

Parameter Default Description
coverage_high 0.75 High coverage threshold
coverage_low 0.25 Low coverage threshold
test_coverage_minimum 50 Minimum required coverage %
test_coverage_good 80 Good coverage %
high_impact Min score for high-impact methods
min_callers Min callers for critical classification
impact_score_medium Medium impact threshold

Scoring Weights

Parameter Default Description
coverage_base_score 5.0 Base priority score
coverage_critical_module 3.0 Bonus for critical modules
coverage_important_module 2.0 Bonus for important modules
coverage_long_signature 1.5 Bonus for complex signatures (>200 chars)
coverage_medium_signature 1.0 Bonus for medium signatures (>100 chars)

Handler Limits

Parameter Default Description
display_items 15 Items in report output
summary_items 5 Summary items
query_medium 30 Medium query limit (impact analysis)
cpg_results 50 CPG result limit
retrieved_functions 25 Functions for benchmark evaluation
priority_functions 10 Prioritized list size

CLI Usage

# Import pytest-cov JSON report
python -m src.cli.import_commands coverage import --file coverage.json --format pytest-cov --db data/projects/postgres.duckdb

# Import lcov trace file
python -m src.cli.import_commands coverage import --file lcov.info --format lcov

# Import Cobertura XML (e.g., from Java/C# tooling)
python -m src.cli.import_commands coverage import --file coverage.xml --format cobertura --source-root /project

# View imported coverage data
python -m src.cli.import_commands coverage show
python -m src.cli.import_commands coverage show --uncovered-only

REST API

Test coverage queries are handled via the scenario router:

Method Endpoint Description
POST /api/v1/scenarios/test_coverage/query Execute test coverage analysis

Request model (ScenarioQueryRequest):

Field Type Description
query str Analysis query (1–10000 chars)
session_id str? Session identifier
language str "en" or "ru" (default: "en")

Response model (ScenarioQueryResponse):

Field Type Description
answer str Formatted analysis result
scenario_id str "test_coverage"
confidence float Intent confidence
evidence list[dict] Supporting evidence
processing_time_ms float Processing time

Example:

curl -X POST http://localhost:8000/api/v1/scenarios/test_coverage/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Find untested functions", "language": "en"}'

Use Cases

Finding Untested Code

> What functions lack test coverage?

## Coverage Gaps

**Detection mode:** Heuristic
**Total untested:** 234 functions
**Coverage estimate:** 78%

### Critical (executor)
- ExecParallelHashJoinNewBatch()
- ExecReScanGather()

### High priority (storage)
- heap_lock_updated_tuple()
- heap_abort_speculative()

### Recommendations for `heap_lock_updated_tuple`
1. Add 5 test cases for branch coverage (IF: 3, SWITCH: 1)
2. Test boundary values for parameter `flags` (zero, negative, max)
3. Test error handling: 2 try/catch blocks

Test Prioritization

> Which critical functions need tests first?

## Test Priority Ranking

| Function | Score | Priority | Reason |
|----------|-------|----------|--------|
| heap_lock_updated_tuple | 8.5 | high | core module, 23 callers, public API |
| ExecParallelHashJoinNewBatch | 7.0 | high | engine module, complex signature |
| AtEOXact_RelationCache | 5.0 | medium | system module, 4 callers |

Generating Test Cases

> Generate test cases for heap_insert

## Test Recommendations: heap_insert()

**File:** src/backend/access/heap/heapam.c:2156

**Strategy:**
- Unit Tests: Test heap_insert in isolation by mocking 5 dependencies
- Integration Tests: Test through 23 callers to verify real-world usage
- Edge Cases: Test boundary conditions, null inputs, error handling

**Dependencies to Mock:**
- RelationGetBufferForTuple()
- heap_prepare_insert()
- XLogInsert()

**Test Scenarios from Callers:**
- simple_heap_insert() uses heap_insert for...
- toast_save_datum() uses heap_insert for...

Hybrid Detection (with runtime data)

> Find untested code

## Coverage Gaps (Hybrid)

**Detection mode:** Runtime + Heuristic
**Coverage estimate:** 62.3% (from runtime data)

| Method | Detection | Coverage | Reason |
|--------|-----------|----------|--------|
| parse_query() | Runtime | 0.0% | No lines covered |
| exec_plan() | Runtime | 12.5% | Partial coverage |
| helper_func() | Heuristic | --- | No test callers |

Example Questions

Untested code detection: - “What functions lack test coverage?” - “Find untested code” - “Show functions without tests” - “Which code is not covered by tests?”

Test prioritization: - “Which critical functions need tests first?” - “What should I test first?” - “Test priority ranking”

Test generation: - “Generate test cases for heap_insert” - “Create tests for palloc function” - “What edge cases should I test in ExecInitNode?” - “Write mutation tests for query parser”

Coverage overview: - “Show coverage gaps” - “Coverage report” - “How to improve test coverage?”