Tech lead planning and coordinating large-scale symbol/API migrations across the codebase.
Table of Contents¶
- Quick Start
- How It Works
- Architecture (S13 and S05)
- Symbol Extraction
- Migration Categories
- Priority Queries
- Complexity Categorization
- Refactoring Agents
- TechnicalDebtDetector
- DeadCodeDetector
- ImpactAnalyzer
- RefactoringPlanner
- Domain-Agnostic Architecture
- Risk Analysis
- CallGraphAnalyzer and Blast Radius
- Betweenness Centrality Risk
- CLI Usage
- Example Questions
- Related Scenarios
Quick Start¶
# Select Mass Refactoring Scenario
/select 13
How It Works¶
Architecture (S13 and S05)¶
S13 (mass_refactoring_workflow) is an alias for refactoring_workflow(state, mode="mass_migration") defined in src/workflow/scenarios/refactoring/workflow.py. It shares the refactoring package with S05:
| Workflow | Mode | Purpose |
|---|---|---|
refactoring_workflow (S05) |
code_smells (default) |
Code smell detection, dead code |
large_scale_refactoring_workflow |
large_scale |
Bulk refactoring with ROI analysis |
mass_refactoring_workflow (S13) |
mass_migration |
Symbol/API migrations and renames |
When mode="mass_migration", the workflow dispatches directly to mass_migration_workflow() in mass_migration.py (921 lines), bypassing the handler-based Phase 1 used by S05.
S13 uses a two-stage approach internally:
Query -> Symbol Extraction (5 regex patterns)
|
Stage 1: Priority CPG queries (4 levels) + Category queries (9 types)
| -> Collect target functions from CPG (no LLM)
|
Stage 2: LLM generates refactoring plan based on found symbols
| -> Fallback: _build_fallback_answer() if LLM unavailable
Symbol Extraction¶
The workflow extracts target symbols from the user query using 5 regex patterns, checked in order:
| # | Pattern | Example Match | Regex |
|---|---|---|---|
| 1 | “rename X to Y” | rename heap_open to table_open |
\brename\s+(\w+)\s+to\s+(\w+) |
| 2 | snake_case names | heap_open, table_close |
\b([a-z][a-z0-9]*(?:_[a-z0-9]+)+)\b |
| 3 | CamelCase names | ExecProcNode, FunctionCall |
\b([A-Z][a-zA-Z0-9]+)\b |
| 4 | “keyword to X” | references to palloc |
(?:references?\s+to\|usages?\s+of)\s*(\w+) |
| 5 | “X calls” | heap_open calls |
([a-z_]+)\s+(?:calls?\|usages?) |
Common words (find, all, update, rename, etc.) are excluded from matching via COMMON_WORDS set.
Migration Categories¶
After symbol extraction, the workflow queries CPG for functions in 9 migration categories:
| # | Category | Keywords | Plugin Source |
|---|---|---|---|
| 1 | Executor | exec, node, rename | sql_patterns["query_execution"] |
| 2 | Memory | memory, alloc | get_memory_functions_from_plugin() |
| 3 | Table/Heap | table, open | sql_patterns["file_operations"] |
| 4 | Error | error | compliance_patterns["error_functions"] |
| 5 | Lock | lock, tranche | get_lock_functions_from_plugin() |
| 6 | Cache | cache, deprecated | sql_patterns["catalog_cache"] |
| 7 | Assert | assert, macro | compliance_patterns["assert_macros"] |
| 8 | Slot/Tuple | slot, tuple, attr | refactoring_targets["slot"] |
| 9 | FunctionCall | functioncall, call | refactoring_targets["functioncall"] |
Each category builds a SQL query against nodes_method joined with edges_call to find functions ranked by caller count.
Priority Queries¶
Before category queries, 4 priority-level queries run first to ensure the most relevant functions appear at the top of results:
| Priority | Category | Method |
|---|---|---|
| 1 | Slot/Tuple | get_tuple_slot_functions_from_plugin(), CASE-based ordering |
| 2 | Assert macros | m.name LIKE 'Assert%' |
| 3 | FunctionCall | refactoring_targets from plugin |
| 4 | Signature update | get_memory_functions_from_plugin(), EN+RU morphological matching |
Priority 4 supports Russian keywords (сигнатура, параметр) via keyword_match_morphological().
Complexity Categorization¶
Found symbols are categorized by refactoring complexity based on caller count:
| Category | Caller Count | Risk Level |
|---|---|---|
| Simple renames | <= min_callers threshold |
Low |
| Signature modifications | min_callers < count <= high_complexity |
Medium |
| Complex refactors | > high_complexity threshold |
High |
Thresholds are configured via get_unified_config().thresholds.
Refactoring Agents¶
S13 shares four refactoring agents with S05, located in src/refactoring/agents/:
TechnicalDebtDetector¶
TechnicalDebtDetector(cpg_service=None) — detects code smells using the pattern library.
detect_all_smells(limit_per_pattern)— run all patterns, return sorted by severitycalculate_debt_metrics(findings)— compute debt ratio and effort metricsdetect_pattern(pattern, limit)— run a single pattern
DeadCodeDetector¶
DeadCodeDetector(cpg_service=None) — specialized detector with 13 dead code patterns:
DEAD_CODE, DEPRECATED_MARKER, DISABLED_CODE_BLOCK, EMPTY_STUB, ERROR_ONLY_FUNCTION, UNREACHABLE_AFTER_RETURN, ORPHAN_COMPONENT, UNUSED_VARIABLE, DEAD_ASSIGNMENT, INVARIANT_DEAD_CODE, DEAD_CALLBACK, SINGLE_CALLER_FUNCTION, TEST_ONLY_FUNCTION.
detect_all(limit_per_pattern)— run all 13 patternsdetect_patterns(patterns, limit_per_pattern)— run specific patternsget_summary(findings)— summary statistics
ImpactAnalyzer¶
ImpactAnalyzer(cpg_service=None) — analyzes change impact on dependencies.
analyze_method_impact(method_name, filename)— callers, callees, impact scoreanalyze_bulk_impact(findings, limit)— batch analysis of multiple findings
RefactoringPlanner¶
RefactoringPlanner() — creates prioritized refactoring plans with ROI estimation.
create_refactoring_plan(findings, impact_analyses)— priority 1-10 tasksgenerate_report(findings, impact_analyses, tasks)—RefactoringReport
Key data models (src/refactoring/agents/models.py):
- CodeSmellFinding — detected smell (severity, category, method_name, effort_hours)
- DeadCodeFinding — dead code instance (detection_type, confidence, line_count)
- ImpactAnalysis — change impact (impact_score, risk_level, direct_dependents)
- RefactoringTask — prioritized task (priority 1-10, effort_hours, refactoring_steps)
- RefactoringReport — full report (total_smells, by_severity, total_effort_hours)
Domain-Agnostic Architecture¶
S13 loads all migration data from domain plugins, never hardcoding domain-specific symbols. Six helper functions from src/workflow/_plugin_helpers.py provide the data:
| Helper | Returns |
|---|---|
get_refactoring_patterns_from_plugin() |
Target refactoring patterns |
get_sql_query_patterns_from_plugin() |
SQL patterns (executor, table, cache) |
get_memory_functions_from_plugin() |
Memory allocation functions |
get_lock_functions_from_plugin() |
Lock/synchronization functions |
get_compliance_patterns_from_plugin() |
Error and assert patterns |
get_tuple_slot_functions_from_plugin() |
Slot/tuple access functions |
Additionally, DomainPluginV3 provides get_migration_keywords() (category-to-keyword mapping) and get_migration_descriptions() (category-to-description mapping) used by the fallback answer generator.
When LLM is unavailable, _build_fallback_answer() generates a structured report using these plugin-provided descriptions organized by matched migration categories.
Risk Analysis¶
CallGraphAnalyzer and Blast Radius¶
The refactoring workflow (shared by S05/S13) uses CallGraphAnalyzer(cpg) from src/analysis/ to assess refactoring risk:
find_all_callers(method_name, max_depth)— transitive caller chainfind_all_callees(method_name, max_depth)— transitive callee chainanalyze_impact(method_name)— impact score
Blast radius = total callers + total callees. Methods are classified as:
- Safe to refactor: blast radius < blast_radius_medium threshold
- Low risk: < caller_count_medium
- Medium risk: < caller_count_high
- High risk: >= caller_count_high
Betweenness Centrality Risk¶
DependencyAnalyzer(cpg).identify_architectural_chokepoints() calculates betweenness centrality for each method. Methods above the compliance_score_high percentile threshold are flagged as critical architectural chokepoints — refactoring these requires extra care because many code paths depend on them.
CLI Usage¶
# Mass refactoring of a symbol
python -m src.cli query "Rename all instances of heap_open to table_open"
# Find migration targets
python -m src.cli query "Find all ExecProcNode references for rename"
# Memory API migration
python -m src.cli query "Plan mass refactoring of memory allocation functions"
# Signature changes
python -m src.cli query "Find functions with signature changes needed"
# Assert macro standardization
python -m src.cli query "Find all Assert macro usages for standardization"
Example Questions¶
- “Rename all instances of [old_name] to [new_name]”
- “Find all references to [function] for mass refactoring”
- “Plan mass refactoring of memory allocation functions”
- “Find functions with signature changes needed”
- “Find all Assert macro usages for standardization”
- “Show slot/tuple access patterns for refactoring”
- “Find FunctionCall patterns for modernization”
- “Plan migration of lock API functions”
- “Find all cache function usages for deprecation”
- “Show error handling functions for migration”
Related Scenarios¶
- Refactoring - Code smell detection and dead code (S05)
- Tech Debt - Technical debt analysis and prioritization (S12)
- Architecture - Dependency and coupling analysis
- Code Review - Review mass changes
S13 vs S05 vs S12: S13 focuses on large-scale symbol/API migrations — renaming functions, updating signatures, and modernizing APIs across the entire codebase. S05 focuses on detecting code smells, dead code, and code clones with actionable refactoring tasks. S12 focuses on measuring and prioritizing technical debt. All three share the refactoring agent infrastructure (TechnicalDebtDetector, DeadCodeDetector, ImpactAnalyzer, RefactoringPlanner) but serve different use cases.