Scenario 13: Mass Refactoring

Tech lead planning and coordinating large-scale symbol/API migrations across the codebase.

Table of Contents

Quick Start

# Select Mass Refactoring Scenario
/select 13

How It Works

Architecture (S13 and S05)

S13 (mass_refactoring_workflow) is an alias for refactoring_workflow(state, mode="mass_migration") defined in src/workflow/scenarios/refactoring/workflow.py. It shares the refactoring package with S05:

Workflow Mode Purpose
refactoring_workflow (S05) code_smells (default) Code smell detection, dead code
large_scale_refactoring_workflow large_scale Bulk refactoring with ROI analysis
mass_refactoring_workflow (S13) mass_migration Symbol/API migrations and renames

When mode="mass_migration", the workflow dispatches directly to mass_migration_workflow() in mass_migration.py (921 lines), bypassing the handler-based Phase 1 used by S05.

S13 uses a two-stage approach internally:

Query -> Symbol Extraction (5 regex patterns)
  |
  Stage 1: Priority CPG queries (4 levels) + Category queries (9 types)
  |         -> Collect target functions from CPG (no LLM)
  |
  Stage 2: LLM generates refactoring plan based on found symbols
  |         -> Fallback: _build_fallback_answer() if LLM unavailable

Symbol Extraction

The workflow extracts target symbols from the user query using 5 regex patterns, checked in order:

# Pattern Example Match Regex
1 “rename X to Y” rename heap_open to table_open \brename\s+(\w+)\s+to\s+(\w+)
2 snake_case names heap_open, table_close \b([a-z][a-z0-9]*(?:_[a-z0-9]+)+)\b
3 CamelCase names ExecProcNode, FunctionCall \b([A-Z][a-zA-Z0-9]+)\b
4 “keyword to X” references to palloc (?:references?\s+to\|usages?\s+of)\s*(\w+)
5 “X calls” heap_open calls ([a-z_]+)\s+(?:calls?\|usages?)

Common words (find, all, update, rename, etc.) are excluded from matching via COMMON_WORDS set.

Migration Categories

After symbol extraction, the workflow queries CPG for functions in 9 migration categories:

# Category Keywords Plugin Source
1 Executor exec, node, rename sql_patterns["query_execution"]
2 Memory memory, alloc get_memory_functions_from_plugin()
3 Table/Heap table, open sql_patterns["file_operations"]
4 Error error compliance_patterns["error_functions"]
5 Lock lock, tranche get_lock_functions_from_plugin()
6 Cache cache, deprecated sql_patterns["catalog_cache"]
7 Assert assert, macro compliance_patterns["assert_macros"]
8 Slot/Tuple slot, tuple, attr refactoring_targets["slot"]
9 FunctionCall functioncall, call refactoring_targets["functioncall"]

Each category builds a SQL query against nodes_method joined with edges_call to find functions ranked by caller count.

Priority Queries

Before category queries, 4 priority-level queries run first to ensure the most relevant functions appear at the top of results:

Priority Category Method
1 Slot/Tuple get_tuple_slot_functions_from_plugin(), CASE-based ordering
2 Assert macros m.name LIKE 'Assert%'
3 FunctionCall refactoring_targets from plugin
4 Signature update get_memory_functions_from_plugin(), EN+RU morphological matching

Priority 4 supports Russian keywords (сигнатура, параметр) via keyword_match_morphological().

Complexity Categorization

Found symbols are categorized by refactoring complexity based on caller count:

Category Caller Count Risk Level
Simple renames <= min_callers threshold Low
Signature modifications min_callers < count <= high_complexity Medium
Complex refactors > high_complexity threshold High

Thresholds are configured via get_unified_config().thresholds.

Refactoring Agents

S13 shares four refactoring agents with S05, located in src/refactoring/agents/:

TechnicalDebtDetector

TechnicalDebtDetector(cpg_service=None) — detects code smells using the pattern library.

  • detect_all_smells(limit_per_pattern) — run all patterns, return sorted by severity
  • calculate_debt_metrics(findings) — compute debt ratio and effort metrics
  • detect_pattern(pattern, limit) — run a single pattern

DeadCodeDetector

DeadCodeDetector(cpg_service=None) — specialized detector with 13 dead code patterns:

DEAD_CODE, DEPRECATED_MARKER, DISABLED_CODE_BLOCK, EMPTY_STUB, ERROR_ONLY_FUNCTION, UNREACHABLE_AFTER_RETURN, ORPHAN_COMPONENT, UNUSED_VARIABLE, DEAD_ASSIGNMENT, INVARIANT_DEAD_CODE, DEAD_CALLBACK, SINGLE_CALLER_FUNCTION, TEST_ONLY_FUNCTION.

  • detect_all(limit_per_pattern) — run all 13 patterns
  • detect_patterns(patterns, limit_per_pattern) — run specific patterns
  • get_summary(findings) — summary statistics

ImpactAnalyzer

ImpactAnalyzer(cpg_service=None) — analyzes change impact on dependencies.

  • analyze_method_impact(method_name, filename) — callers, callees, impact score
  • analyze_bulk_impact(findings, limit) — batch analysis of multiple findings

RefactoringPlanner

RefactoringPlanner() — creates prioritized refactoring plans with ROI estimation.

  • create_refactoring_plan(findings, impact_analyses) — priority 1-10 tasks
  • generate_report(findings, impact_analyses, tasks)RefactoringReport

Key data models (src/refactoring/agents/models.py): - CodeSmellFinding — detected smell (severity, category, method_name, effort_hours) - DeadCodeFinding — dead code instance (detection_type, confidence, line_count) - ImpactAnalysis — change impact (impact_score, risk_level, direct_dependents) - RefactoringTask — prioritized task (priority 1-10, effort_hours, refactoring_steps) - RefactoringReport — full report (total_smells, by_severity, total_effort_hours)

Domain-Agnostic Architecture

S13 loads all migration data from domain plugins, never hardcoding domain-specific symbols. Six helper functions from src/workflow/_plugin_helpers.py provide the data:

Helper Returns
get_refactoring_patterns_from_plugin() Target refactoring patterns
get_sql_query_patterns_from_plugin() SQL patterns (executor, table, cache)
get_memory_functions_from_plugin() Memory allocation functions
get_lock_functions_from_plugin() Lock/synchronization functions
get_compliance_patterns_from_plugin() Error and assert patterns
get_tuple_slot_functions_from_plugin() Slot/tuple access functions

Additionally, DomainPluginV3 provides get_migration_keywords() (category-to-keyword mapping) and get_migration_descriptions() (category-to-description mapping) used by the fallback answer generator.

When LLM is unavailable, _build_fallback_answer() generates a structured report using these plugin-provided descriptions organized by matched migration categories.

Risk Analysis

CallGraphAnalyzer and Blast Radius

The refactoring workflow (shared by S05/S13) uses CallGraphAnalyzer(cpg) from src/analysis/ to assess refactoring risk:

  • find_all_callers(method_name, max_depth) — transitive caller chain
  • find_all_callees(method_name, max_depth) — transitive callee chain
  • analyze_impact(method_name) — impact score

Blast radius = total callers + total callees. Methods are classified as: - Safe to refactor: blast radius < blast_radius_medium threshold - Low risk: < caller_count_medium - Medium risk: < caller_count_high - High risk: >= caller_count_high

Betweenness Centrality Risk

DependencyAnalyzer(cpg).identify_architectural_chokepoints() calculates betweenness centrality for each method. Methods above the compliance_score_high percentile threshold are flagged as critical architectural chokepoints — refactoring these requires extra care because many code paths depend on them.

CLI Usage

# Mass refactoring of a symbol
python -m src.cli query "Rename all instances of heap_open to table_open"

# Find migration targets
python -m src.cli query "Find all ExecProcNode references for rename"

# Memory API migration
python -m src.cli query "Plan mass refactoring of memory allocation functions"

# Signature changes
python -m src.cli query "Find functions with signature changes needed"

# Assert macro standardization
python -m src.cli query "Find all Assert macro usages for standardization"

Example Questions

  • “Rename all instances of [old_name] to [new_name]”
  • “Find all references to [function] for mass refactoring”
  • “Plan mass refactoring of memory allocation functions”
  • “Find functions with signature changes needed”
  • “Find all Assert macro usages for standardization”
  • “Show slot/tuple access patterns for refactoring”
  • “Find FunctionCall patterns for modernization”
  • “Plan migration of lock API functions”
  • “Find all cache function usages for deprecation”
  • “Show error handling functions for migration”

S13 vs S05 vs S12: S13 focuses on large-scale symbol/API migrations — renaming functions, updating signatures, and modernizing APIs across the entire codebase. S05 focuses on detecting code smells, dead code, and code clones with actionable refactoring tasks. S12 focuses on measuring and prioritizing technical debt. All three share the refactoring agent infrastructure (TechnicalDebtDetector, DeadCodeDetector, ImpactAnalyzer, RefactoringPlanner) but serve different use cases.