Scenario 05: Refactoring

Senior developer identifying code improvement opportunities using CPG-powered analysis.

Table of Contents

Quick Start

# Select Refactoring Scenario
/select 05

How It Works

Query Classification

When you ask a refactoring question, the system classifies it into one of four types using keyword matching (supports both English and Russian queries):

Query Type Keywords (EN) Keywords (RU) Routed To
dead_code dead code, unused, unreachable, deprecated мёртвый код, неиспользуемый, недостижимый DeadCodeDetector
duplicates duplicate, clone, copy-paste, identical дубликат, клон, копирование ASTCloneDetector
complexity complexity, cyclomatic, nesting, god class сложность, цикломатическая TechnicalDebtDetector
general refactor, improve, code smell рефакторинг, улучшить, запах кода Full pipeline

Classification is done by detect_refactoring_query_type() which analyzes query keywords and routes to the most appropriate detector.

Two-Phase Architecture

The refactoring scenario uses a two-phase processing pipeline:

User Query
    |
    v
[Phase 1] Handler-based (template responses, no LLM)
    |-- CodeCloneHandler   → structured clone report
    |-- DeadCodeHandler     → structured dead code report
    |-- CodeSmellHandler    → structured smell report
    |
    v (if handler cannot answer)
[Phase 2] Full LLM pipeline
    |-- TechnicalDebtDetector.detect_all_smells()
    |-- ImpactAnalyzer.analyze_bulk_impact()
    |-- RefactoringPlanner.create_refactoring_plan()
    |-- LLM generates natural language response

Phase 1 (handler-based) provides instant structured answers for common queries — no LLM call needed. Phase 2 (LLM fallback) handles complex or ambiguous queries requiring natural language synthesis.

Three Workflow Modes

Mode Description Use Case
code_smells Code smell detection, dead code, complexity analysis (default) Day-to-day code quality
large_scale Bulk refactoring with ROI analysis and prioritization Sprint planning
mass_migration Symbol/API migrations, automated renaming Framework upgrades

The mass_migration mode routes to mass_migration_workflow() — a specialized pipeline for symbol renaming across the entire codebase. See Mass Refactoring (S13) for bulk rename operations.

Dead Code Detection

10 Detection Patterns

The DeadCodeDetector identifies dead code through 10 specialized patterns, each with a confidence score reflecting detection reliability:

Pattern Confidence Description
DEPRECATED_MARKER 0.95 Functions with explicit deprecation markers
UNREACHABLE_AFTER_RETURN 0.85 Code after unconditional return/break/continue
INVARIANT_DEAD_CODE 0.80 Conditions that are always true or always false
DEAD_CODE 0.70 Functions with zero callers in call graph
EMPTY_STUB 0.65 Functions with empty body or placeholder implementation
UNUSED_VARIABLE 0.60 Variables declared but never read
DEAD_CALLBACK 0.55 Callback functions that are never registered
SINGLE_CALLER_FUNCTION 0.50 Functions called from only one location (inline candidates)
ORPHAN_COMPONENT 0.45 Components disconnected from the main call graph
TEST_ONLY_FUNCTION 0.40 Non-test functions called exclusively from test code

Findings are ranked by confidence — higher confidence patterns are more likely true positives and safer to act on.

Intent-Based Filtering

The system detects your intent from query keywords and runs only the relevant patterns:

"Find deprecated functions"     → DEPRECATED_MARKER only
"Find unused static functions"  → DEAD_CODE + SINGLE_CALLER_FUNCTION
"Find unreachable code"         → UNREACHABLE_AFTER_RETURN + INVARIANT_DEAD_CODE
"Find dead code"                → All default patterns (DEAD_CODE + DEPRECATED_MARKER + EMPTY_STUB)

Russian queries work the same way:

"Найти устаревшие функции"      → DEPRECATED_MARKER
"Найти неиспользуемый код"      → DEAD_CODE + UNUSED_VARIABLE + SINGLE_CALLER_FUNCTION
"Найти недостижимый код"        → UNREACHABLE_AFTER_RETURN + INVARIANT_DEAD_CODE

Dead Code Examples

> Find unused static functions

╭─────────────── Dead Code Analysis ──────────────────────────╮
│                                                              │
│  Unused Static Functions (no callers found):                 │
│                                                              │
│  src/backend/utils/misc/help.c:                              │
│    - old_format_help()         Lines: 45-89                  │
│    - legacy_usage_message()    Lines: 123-156                │
│                                                              │
│  src/backend/catalog/pg_type.c:                              │
│    - deprecated_type_check()   Lines: 789-834                │
│                                                              │
│  Confidence: HIGH (static functions, no external refs)       │
│  Recommendation: Safe to remove after verification           │
│                                                              │
╰──────────────────────────────────────────────────────────────╯

Code Duplication

AST Clone Detection

The ASTCloneDetector (src/analysis/clone_detector.py) identifies duplicate code by comparing Abstract Syntax Trees rather than raw text. This catches semantic clones even when variable names differ.

The CodeCloneHandler (Phase 1) provides structured clone reports without LLM:

> Find duplicate code patterns

╭─────────────── Clone Detection ─────────────────────────────╮
│                                                              │
│  Duplicate Code Clusters:                                    │
│                                                              │
│  Cluster #1 (95% similarity, 34 lines):                      │
│    - src/backend/access/heap/heapam.c:1234-1267              │
│    - src/backend/access/heap/heapam.c:1890-1923              │
│    Suggestion: Extract common logic into helper function     │
│                                                              │
│  Cluster #2 (88% similarity, 22 lines):                      │
│    - src/backend/executor/nodeSeqscan.c:45-66                │
│    - src/backend/executor/nodeIndexscan.c:78-99              │
│    - src/backend/executor/nodeBitmapscan.c:89-110            │
│    Suggestion: Create shared scan initialization routine     │
│                                                              │
│  Total clones: 15 clusters                                   │
│  Lines duplicated: ~450                                      │
│                                                              │
╰──────────────────────────────────────────────────────────────╯

Complexity Analysis

The TechnicalDebtDetector identifies functions with excessive cyclomatic and cognitive complexity:

> Find functions with cyclomatic complexity over 20

╭─────────────── Complexity Analysis ─────────────────────────╮
│                                                              │
│  Functions Exceeding Threshold (complexity > 20):            │
│                                                              │
│  exec_simple_query()        complexity: 47                   │
│     Location: src/backend/tcop/postgres.c:890                │
│     Suggestion: Extract authentication, parsing,             │
│                 and execution into separate functions        │
│                                                              │
│  ExecInitNode()             complexity: 32                   │
│     Location: src/backend/executor/execProcnode.c:156        │
│     Suggestion: Use dispatch table instead of switch         │
│                                                              │
│  Total: 23 functions above threshold                         │
│                                                              │
╰──────────────────────────────────────────────────────────────╯

Impact Analysis

Call Graph Impact

The ImpactAnalyzer uses CallGraphAnalyzer to determine the blast radius of any refactoring change — how many callers, transitive dependents, and subsystems would be affected:

> What would be affected by refactoring heap_insert?

╭─────────────── Impact Analysis ─────────────────────────────╮
│                                                              │
│  Refactoring Target: heap_insert()                           │
│  Direct Callers: 23                                          │
│  Transitive Callers: 156                                     │
│                                                              │
│  Affected Subsystems:                                        │
│    - Executor (12 callers)                                   │
│    - COPY command (3 callers)                                │
│    - Trigger execution (4 callers)                           │
│    - Foreign data wrapper (4 callers)                        │
│                                                              │
│  Blast Radius: HIGH (156 transitive dependents)              │
│  Safe Refactorings: 3 (functions with ≤5 callers)            │
│                                                              │
╰──────────────────────────────────────────────────────────────╯

Betweenness Centrality

The DependencyAnalyzer computes betweenness centrality for each function in the call graph — identifying architectural chokepoints where many call paths converge. Functions with high betweenness are risky refactoring targets because changes propagate widely:

Betweenness Risk Assessment:
  - ExecProcNode()    percentile: 98%  risk: CRITICAL
  - heap_insert()     percentile: 92%  risk: HIGH
  - ExecScan()        percentile: 87%  risk: HIGH
  - pg_parse_query()  percentile: 45%  risk: LOW

High betweenness percentile = many shortest paths pass through this function. Refactoring it affects many subsystems.

Technical Debt Metrics

The TechnicalDebtDetector.calculate_debt_metrics() computes quantitative debt metrics:

Metric Description
total_effort_hours Estimated hours to fix all detected smells
debt_ratio Technical debt as a ratio of total codebase effort
estimated_value Dollar estimate of debt remediation value
by_severity Breakdown: critical / high / medium / low
by_category Breakdown: dead_code / complexity / duplication / …

These metrics are included in the metadata of every refactoring response, enabling trend tracking across sprints.

Refactoring Planning

Extract Method Opportunities

> Find extract method opportunities in exec_simple_query

╭─────────────── Extract Method Analysis ─────────────────────╮
│                                                              │
│  Function: exec_simple_query()                               │
│  Current Lines: 234                                          │
│                                                              │
│  Suggested Extractions:                                      │
│                                                              │
│  1. Lines 45-78: Authentication handling                     │
│     Suggested name: check_query_authorization()              │
│     Parameters: query_string, session_state                  │
│                                                              │
│  2. Lines 89-134: Query parsing                              │
│     Suggested name: parse_and_analyze_query()                │
│     Parameters: query_string                                 │
│     Returns: ParsedQuery                                     │
│                                                              │
│  3. Lines 156-198: Plan generation                           │
│     Suggested name: generate_query_plan()                    │
│     Parameters: analyzed_query                               │
│     Returns: PlannedQuery                                    │
│                                                              │
│  Impact: Reduces complexity from 47 to ~12 per function      │
│                                                              │
╰──────────────────────────────────────────────────────────────╯

Prioritized Task List

The RefactoringPlanner generates a prioritized task list from all findings:

Refactoring Tasks (by priority):
  1. [HIGH] Remove deprecated_type_check() — 0 callers, safe to delete
  2. [HIGH] Extract 3 methods from exec_simple_query() — complexity 47→12
  3. [MEDIUM] Merge duplicate scan init routines — 3 files, 66 lines saved
  4. [LOW] Inline single_caller helper handle_auth_step() — 1 caller

Each task includes effort estimate, risk level, and expected improvement.

CLI Usage

# Query-based refactoring analysis
python -m src.cli query "Find unused functions in executor module"

# Prompt-based deep analysis
python -m src.cli exec --prompt "Suggest refactoring opportunities for high-complexity functions"

# Dead code detection
python -m src.cli query "Find deprecated and unreachable code"

# Clone detection
python -m src.cli query "Find duplicate code patterns"

Example Questions

Dead Code: - “Find unused static functions” - “Find deprecated functions” / “Найти устаревшие функции” - “Find unreachable code after return” / “Найти недостижимый код” - “Find empty stub implementations” - “Find orphan components”

Duplication: - “Show duplicate code patterns” - “Find copy-paste code in executor”

Complexity: - “Find functions with high cyclomatic complexity” - “Find long functions that should be split” - “Find god classes”

Impact: - “What would be affected by changing heap_insert?” - “Suggest refactoring for exec_simple_query” - “Show blast radius for ExecProcNode changes”