Architect analyzing dependencies, duplications, and consolidation opportunities across multiple repositories using CPG-based agents and graph methods.
Table of Contents¶
- Quick Start
- How It Works
- Two-Phase Architecture
- Intent Detection
- Handler Phase
- 3 Handlers
- CrossRepoHandler Base Methods
- Report Formatters
- LLM Phase
- 3 Agents
- CallGraphAnalyzer Integration
- Graph Insights
- Data Models
- Configuration
- CLI Usage
- Example Questions
- Related Scenarios
Quick Start¶
# Select Cross-Repository Scenario
/select 10
How It Works¶
Two-Phase Architecture¶
S10 uses the standard two-phase architecture with handler-based Phase 1 and LLM fallback Phase 2:
Query -> CrossRepoIntentDetector.detect()
|
Phase 1: integrate_handlers(state)
-> HandlerRegistry("cross_repo") -> match handler by intent
-> handler.handle() -> CrossRepoReportFormatter -> answer
|
handled=True -> return formatted result (no LLM)
handled=False -> Phase 2
|
Phase 2: cross_repo_workflow() [LLM fallback]
-> 3 Agents (RepositoryIndexer, CrossRepoAnalyzer, DependencyMapper)
-> CallGraphAnalyzer (graph insights)
-> PromptRegistry -> LLMInterface.generate() -> answer
cross_repo_workflow() in cross_repo.py first calls integrate_handlers(state). If a handler matches (handled=True), the result is returned without LLM. Otherwise, the full LLM-based workflow executes with 3 agents and graph analysis.
Intent Detection¶
CrossRepoIntentDetector(IntentDetector) in cross_repo_handlers/intent_detector.py defines 6 intents sorted by priority:
| Intent | Priority | Keywords (EN + RU) |
|---|---|---|
hook_usage |
5 | hook, extension hook, callback, хук, точка расширения |
spi_dependency |
8 | spi, server programming interface, серверный интерфейс |
code_consolidation |
10 | consolidate, merge, combine, unified, консолидация, объединить |
duplicate_detection |
20 | duplicate, clone, copy, redundant, дубликат, клон |
cross_repo_search |
30 | across repo, cross repo, all repo, кросс-репозиторий |
consistency_check |
40 | consistency, standard, naming convention, согласованность |
Fallback: general_cross_repo (confidence=0.5) when no pattern matches.
Keyword matching uses keyword_match_morphological() for Russian lemma support. Domain-specific hook keywords are loaded dynamically via _get_domain_hook_keywords() from the active domain plugin.
Handler Phase¶
3 Handlers¶
cross_repo_handlers/workflow.py registers 3 handlers in HandlerRegistry("cross_repo"):
| Handler | Priority | Intent | Description |
|---|---|---|---|
ExtensionDependenciesHandler |
5 | hook_usage, spi_dependency | Extension API and SPI dependency analysis |
ConsolidationHandler |
10 | code_consolidation | Code consolidation candidate detection |
DuplicateDetectionHandler |
20 | duplicate_detection | Duplicate code detection across files |
All handlers inherit from CrossRepoHandler(BaseHandler).
CrossRepoHandler Base Methods¶
CrossRepoHandler in cross_repo_handlers/handlers/base.py provides 4 shared CPG query methods:
| Method | Description |
|---|---|
_find_duplicate_code(min_similarity, limit) |
Find duplicate code patterns via signature matching |
_search_across_files(pattern, scope, limit) |
Search for methods matching a pattern across files |
_check_naming_consistency(pattern_type) |
Check naming convention consistency (snake_case, camelCase, PascalCase) |
_analyze_consolidation_candidates(threshold) |
Find code consolidation candidates grouped by signature |
ExtensionDependenciesHandler additionally uses get_extension_dependency_patterns_from_plugin() and build_sql_like_clause() from _plugin_helpers to load domain-specific extension API patterns (domain-agnostic approach).
Report Formatters¶
CrossRepoReportFormatter(CrossRepoFormatter) in cross_repo_handlers/formatters/cross_repo_report.py provides 2 report formatters:
| Method | Used by |
|---|---|
format_consolidation_report(report_data, language) |
ConsolidationHandler |
format_duplicate_report(report_data, language) |
DuplicateDetectionHandler |
CrossRepoFormatter(BaseFormatter) provides helper methods: format_file_list(), format_consolidation_badge(), format_similarity_badge(). All formatters support EN/RU localization.
LLM Phase¶
3 Agents¶
When no handler matches, cross_repo_workflow() executes the full LLM pipeline with 3 agents from src/cross_repo/cross_repo_agents.py:
| Agent | Role | Key Methods |
|---|---|---|
RepositoryIndexer |
Discover repos, extract metadata, index into CPG | discover_repositories(path), index_repository_cpg(repo) |
CrossRepoAnalyzer |
Detect code duplications and consolidation opportunities | find_code_duplications(repos, min_similarity, min_lines), find_similar_utilities(repos), identify_consolidation_opportunities(dups) |
DependencyMapper |
Map dependencies, detect circular deps, generate report | map_dependencies(repos), generate_dependency_graph(deps), detect_circular_dependencies(graph), generate_dependency_report(...) |
The pipeline:
1. RepositoryIndexer discovers and indexes repositories from workspace_path
2. CrossRepoAnalyzer finds duplications and consolidation opportunities
3. DependencyMapper maps dependencies, detects circular dependencies, generates report
4. Evidence list built from duplications, opportunities, and high-risk dependencies
5. PromptRegistry.get_agent_prompt("cross_repo_analyzer", ...) builds the prompt
6. LLMInterface().generate() produces the final answer
CallGraphAnalyzer Integration¶
After the 3 agents complete, CallGraphAnalyzer(cpg) from src/analysis performs graph-based analysis:
- Shared methods: For each duplication, analyzes callers/callees/impact via
find_all_callers(),find_all_callees(),analyze_impact(). Calculatesconsolidation_score = (callers + callees) * instances. - Cross-repo calls: For dependencies with
source_method, finds callees to detect tight coupling. Flags high-coupling dependencies for decoupling priority. - Consolidation patterns: Groups methods by call graph signature (set of callees). Methods with identical call patterns are consolidation candidates.
Graph Insights¶
Graph analysis produces 3 insight categories stored in state["metadata"]["graph_insights"]:
| Category | Description |
|---|---|
shared_methods |
Methods with consolidation_benefit (high/medium/low) and consolidation_score |
cross_repo_calls |
High-coupling dependencies with decoupling_priority |
consolidation_patterns |
Method groups with same call signature — “Extract to shared library” candidates |
Data Models¶
Key dataclasses from src/cross_repo/repo_patterns.py:
| Model | Key Fields |
|---|---|
RepositoryInfo |
repo_id, name, language, method_count, file_count |
CodeDuplication |
pattern_name, similarity_score, severity, instances (List[CodeInstance]), potential_savings |
CrossRepoDependency |
source_repo, target_repo, dependency_type (DependencyType), coupling_score, risk_level (RiskLevel) |
ConsolidationOpportunity |
title, priority, estimated_savings, estimated_effort |
ConsolidationReport |
total_repos, total_methods, risk_summary, estimated_total_savings |
Enums: DependencyType (import, function_call, type_reference, …), RiskLevel (critical, high, medium, low).
Configuration¶
S10 is domain-agnostic. All domain-specific data is loaded from plugins:
_get_api_keywords()— API keyword-to-function mapping fromdomain.get_api_keywords()_get_domain_hook_keywords()— hook-specific keywords fromdomain.get_intent_keywords("CROSS_REPO")get_extension_dependency_patterns_from_plugin()— extension dependency patterns
Project configuration in config.yaml → projects:
projects:
registry:
postgres:
db_path: data/projects/postgres.duckdb
language: c
domain: postgres
extension1:
db_path: data/projects/extension1.duckdb
language: c
domain: postgres
CLI Usage¶
# Cross-repo dependency analysis
python -m src.cli query "Find cross-repository dependencies"
# Duplicate detection
python -m src.cli query "Find duplicate code across repositories"
# Consolidation analysis
python -m src.cli query "Show consolidation opportunities"
# Extension dependency analysis
python -m src.cli query "Which extensions depend on memory allocation functions?"
# Hook usage analysis
python -m src.cli query "Find all hook usage patterns"
# Consistency check
python -m src.cli query "Check naming consistency across codebase"
Example Questions¶
- “Find cross-repository dependencies”
- “Find duplicate code across repositories”
- “Show consolidation opportunities”
- “Which extensions use [function_name]?”
- “Find all hook usage patterns”
- “Check naming consistency across the codebase”
- “What would break if [function] changes?”
- “Show high-risk dependencies”
- “Find functions used by extensions”
Related Scenarios¶
- Architecture - Internal architecture analysis (S11)
- Refactoring - Code smell detection and dead code (S05)
- Mass Refactoring - Large-scale changes across codebase (S13)
- Security Audit - Security vulnerability scanning (S02)
S10 vs S11: S10 (Cross-Repository) focuses on multi-repository analysis — duplications across repos, inter-repo dependencies, consolidation opportunities, and cross-repo call graph analysis. S11 (Architecture) focuses on internal architecture of a single project — layer analysis, coupling/cohesion metrics, circular dependencies within one codebase.