Integration guide for linking CPG code entities with external systems (Git, Issue Trackers, APM).
Table of Contents¶
- Overview
- Architecture
- Quick Setup
- Git Integration
- GitSyncService
- Available Tags
- Git Queries
- Issue Tracker Integration
- IssueSyncService
- Supported Providers
- Available Tags
- Creating Issues
- APM/Sentry Integration
- SentrySyncService
- Available Tags
- Using the Orchestrator
- Python API
- CLI Interface
- Convenience Functions
- Server-Side API Reference
- Enums
- Core Dataclasses
- Git Dataclasses
- Issue Dataclasses
- Sentry Dataclasses
- ExternalContextBase
- OrchestratorResult
- Troubleshooting
- Next Steps
Overview¶
External Context Integration allows you to link code entities (methods, functions, classes) in your Code Property Graph (CPG) with metadata from external systems:
- Git: Author information, commit history, code churn, blame data
- Issue Trackers: Jira, GitHub Issues, GitLab Issues
- APM/Error Tracking: Sentry error data, frequency buckets, severity
This enables powerful queries like: - “Who wrote this code?” - “What issues are linked to this function?” - “Which methods have the most production errors?” - “What code changes most frequently?”
Architecture¶
+-----------------------------------------------------------------+
| External Systems |
+---------------+---------------------+---------------------------+
| Git | Issue Trackers | Sentry |
| (commits) | (Jira/GitHub/GL) | (errors) |
+-------+-------+----------+----------+-----------+---------------+
| | |
v v v
+-----------------------------------------------------------------+
| ExternalContextOrchestrator |
| +---------------+ +-----------------+ +---------------------+ |
| |GitSyncService | |IssueSyncService | |SentrySyncService | |
| +---------------+ +-----------------+ +---------------------+ |
+-----------------------------------------------------------------+
| |
v v
+--------------------+ +------------------------+
| PostgreSQL | | DuckDB CPG |
| (raw metadata) | | (nodes_tag_v2, |
| - external_context | | edges_tagged_by, |
| - file_commit_hist | | nodes_method) |
| - git_authors | +------------------------+
| - runtime_metrics |
+--------------------+
Data flows from external systems through the sync services into two stores:
- PostgreSQL (optional): stores raw metadata in tables
external_context,file_commit_history,git_authors,runtime_metrics - DuckDB CPG: stores tags in
nodes_tag_v2and links them to methods viaedges_tagged_by
All tags are stored in nodes_tag_v2 (not nodes_tag). Method-tag links go through edges_tagged_by.
Quick Setup¶
1. Install Dependencies¶
pip install -r requirements.txt
2. Configure Environment Variables¶
# Git (no additional config needed - uses local git)
# GitHub Issues
export GITHUB_TOKEN="ghp_xxxxxxxxxxxx"
# GitLab Issues
export GITLAB_TOKEN="glpat-xxxxxxxxxxxx"
# Jira
export JIRA_TOKEN="your_jira_api_token"
# Sentry
export SENTRY_AUTH_TOKEN="your_sentry_token"
3. Run Initial Sync¶
from src.services.external_context import ExternalContextOrchestrator
orchestrator = ExternalContextOrchestrator(
duckdb_conn=your_duckdb_connection,
pg_conn=your_postgres_connection, # optional
repo_path="/path/to/your/repo"
)
result = orchestrator.sync_all(
git_config={"since_days": 90, "use_blame": True},
issue_config={
"provider": "github",
"repo": "owner/repo",
"token": os.getenv("GITHUB_TOKEN"),
},
sentry_config={
"org_slug": "my-org",
"project_slug": "my-project",
"token": os.getenv("SENTRY_AUTH_TOKEN"),
},
)
print(f"Total synced: {result.total_items_synced}, tags: {result.total_tags_created}")
Git Integration¶
GitSyncService¶
Syncs git history metadata to CPG tags.
from src.services.external_context import GitSyncService
service = GitSyncService(
duckdb_conn=conn,
pg_conn=pg_conn, # optional, for storing raw data
repo_path="/path/to/repo",
max_commits=1000, # default
since_days=90 # default
)
# Sync with blame analysis
result = service.sync(use_blame=True)
print(f"Synced {result.items_synced} commits, created {result.tags_created} tags")
# Sync only specific files, without blame
result = service.sync(files=["src/main.py", "src/auth.py"], use_blame=False)
Constructor parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
duckdb_conn |
connection | required | DuckDB connection |
pg_conn |
connection | None |
PostgreSQL connection (optional) |
repo_path |
str | "." |
Path to git repository |
max_commits |
int | 1000 |
Maximum commits to process |
since_days |
int | 90 |
How many days back to look |
Sync parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
files |
List[str] | None |
Limit sync to specific files |
use_blame |
bool | True |
Run git blame analysis |
Available Git Tags¶
| Tag Name | Description | Example Value |
|---|---|---|
git-author |
Email of last modifier | dev@company.com |
git-commit |
SHA of last commit modifying this method | a1b2c3d4... |
git-branch |
Branch where code originated | feature/PROJ-123 |
git-blame-count |
Number of unique authors | 3 |
git-churn |
Number of modifications | 15 |
git-last-modified |
Timestamp of last change | 2025-01-09T10:30:00Z |
Git Queries¶
All queries use nodes_tag_v2 (the current tag table).
-- Find all methods modified by a specific author
SELECT m.full_name, m.filename, t.value AS author
FROM nodes_method m
JOIN edges_tagged_by e ON m.id = e.src
JOIN nodes_tag_v2 t ON e.dst = t.id
WHERE t.name = 'git-author' AND t.value = 'developer@example.com';
-- Find high-churn code (methods changed > 10 times)
SELECT m.full_name, CAST(t.value AS INT) AS churn_count
FROM nodes_method m
JOIN edges_tagged_by e ON m.id = e.src
JOIN nodes_tag_v2 t ON e.dst = t.id
WHERE t.name = 'git-churn' AND CAST(t.value AS INT) > 10
ORDER BY churn_count DESC;
-- Find bus factor candidates (methods with only 1 author)
SELECT m.full_name, t_author.value AS sole_author
FROM nodes_method m
JOIN edges_tagged_by e1 ON m.id = e1.src
JOIN nodes_tag_v2 t_blame ON e1.dst = t_blame.id
JOIN edges_tagged_by e2 ON m.id = e2.src
JOIN nodes_tag_v2 t_author ON e2.dst = t_author.id
WHERE t_blame.name = 'git-blame-count' AND t_blame.value = '1'
AND t_author.name = 'git-author';
-- Find recently modified methods (last 7 days)
SELECT m.full_name, m.filename, t.value AS last_modified
FROM nodes_method m
JOIN edges_tagged_by e ON m.id = e.src
JOIN nodes_tag_v2 t ON e.dst = t.id
WHERE t.name = 'git-last-modified'
AND CAST(t.value AS TIMESTAMP) > NOW() - INTERVAL '7 days'
ORDER BY t.value DESC;
Issue Tracker Integration¶
IssueSyncService¶
Links issues to code via commit message references.
from src.services.external_context import IssueSyncService
# GitHub
service = IssueSyncService(
duckdb_conn=conn,
pg_conn=pg_conn,
provider="github",
repo="owner/repo",
token=os.getenv("GITHUB_TOKEN"),
max_issues=500 # default
)
# GitLab
service = IssueSyncService(
duckdb_conn=conn,
provider="gitlab",
repo="owner/repo",
base_url="https://gitlab.company.com",
token=os.getenv("GITLAB_TOKEN")
)
# Jira
service = IssueSyncService(
duckdb_conn=conn,
provider="jira",
project_key="PROJ",
base_url="https://company.atlassian.net",
token=os.getenv("JIRA_TOKEN")
)
result = service.sync(since_days=90, link_via_commits=True)
Constructor parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
duckdb_conn |
connection | required | DuckDB connection |
pg_conn |
connection | None |
PostgreSQL connection (optional) |
provider |
str | "github" |
Provider: github, gitlab, jira |
repo |
str | None |
Repository (GitHub/GitLab: owner/repo) |
base_url |
str | None |
Base URL for self-hosted instances |
project_key |
str | None |
Jira project key (e.g., PROJ) |
token |
str | None |
API authentication token |
max_issues |
int | 500 |
Maximum issues to fetch |
Sync parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
since_days |
int | 90 |
How many days back to look |
link_via_commits |
bool | True |
Link issues to code through commit refs |
Supported Providers¶
| Provider | Config | Issue Pattern |
|---|---|---|
| GitHub | provider="github", repo="owner/repo" |
#123, GH-123 |
| GitLab | provider="gitlab", repo="owner/repo" |
#123, !123 (MRs) |
| Jira | provider="jira", project_key="PROJ" |
PROJ-123 |
Note: For Jira, use the project_key parameter (not repo).
Available Issue Tags¶
| Tag Name | Description | Example Value |
|---|---|---|
issue-id |
Issue identifier | PROJ-123, #456 |
issue-type |
Type of issue | bug, feature, refactor |
issue-status |
Current status | open, closed, in_progress |
issue-priority |
Issue priority | critical, high, medium, low |
issue-label |
Issue labels | critical, tech-debt |
Creating Issues¶
The IssueSyncService can create Jira issues programmatically:
service = IssueSyncService(
duckdb_conn=conn,
provider="jira",
project_key="PROJ",
base_url="https://company.atlassian.net",
token=os.getenv("JIRA_TOKEN")
)
issue_key = service.create_jira_issue(
summary="Fix NullPointerException in AuthService",
description="Method authenticate() throws NPE when token is expired",
issue_type="Bug", # default
priority="Medium", # default
labels=["codegraph", "auto-detected"],
components=["auth"]
)
# Returns "PROJ-456" or None on failure
APM/Sentry Integration¶
SentrySyncService¶
Syncs error data from Sentry to identify error-prone code.
from src.services.external_context import SentrySyncService
service = SentrySyncService(
duckdb_conn=conn,
pg_conn=pg_conn,
org_slug="my-organization",
project_slug="my-project",
token=os.getenv("SENTRY_AUTH_TOKEN"),
base_url="https://sentry.io", # default; override for self-hosted
max_issues=200 # default
)
result = service.sync(since_days=30, min_events=1)
Constructor parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
duckdb_conn |
connection | required | DuckDB connection |
pg_conn |
connection | None |
PostgreSQL connection (optional) |
org_slug |
str | None |
Sentry organization slug |
project_slug |
str | None |
Sentry project slug |
token |
str | None |
Sentry auth token |
base_url |
str | "https://sentry.io" |
Base URL (override for self-hosted) |
max_issues |
int | 200 |
Maximum issues to fetch |
Sync parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
since_days |
int | 30 |
How many days back to look |
min_events |
int | 1 |
Minimum event count to include |
Available Sentry Tags¶
| Tag Name | Description | Example Value |
|---|---|---|
sentry-issue |
Sentry issue ID | SENTRY-12345 |
error-frequency |
Frequency bucket based on event count | critical, high, medium, low, rare |
error-type |
Exception type | NullPointerException |
error-level |
Severity level | error, fatal, warning |
sentry-first-seen |
When error was first observed | 2025-06-15T08:30:00Z |
The error-frequency tag uses buckets based on total event count:
| Bucket | Event Count |
|---|---|
critical |
>= 10000 |
high |
>= 1000 |
medium |
>= 100 |
low |
>= 10 |
rare |
< 10 |
-- Find methods with critical error frequency
SELECT m.full_name, m.filename, t.value AS frequency
FROM nodes_method m
JOIN edges_tagged_by e ON m.id = e.src
JOIN nodes_tag_v2 t ON e.dst = t.id
WHERE t.name = 'error-frequency' AND t.value = 'critical';
-- Find methods by error type
SELECT m.full_name, t.value AS error_type
FROM nodes_method m
JOIN edges_tagged_by e ON m.id = e.src
JOIN nodes_tag_v2 t ON e.dst = t.id
WHERE t.name = 'error-type' AND t.value LIKE '%NullPointer%';
-- Combine: methods with high+ frequency AND fatal level
SELECT DISTINCT m.full_name, m.filename
FROM nodes_method m
JOIN edges_tagged_by e1 ON m.id = e1.src
JOIN nodes_tag_v2 t_freq ON e1.dst = t_freq.id
JOIN edges_tagged_by e2 ON m.id = e2.src
JOIN nodes_tag_v2 t_level ON e2.dst = t_level.id
WHERE t_freq.name = 'error-frequency' AND t_freq.value IN ('critical', 'high')
AND t_level.name = 'error-level' AND t_level.value = 'fatal';
Using the Orchestrator¶
Python API¶
The ExternalContextOrchestrator coordinates all sync services.
from src.services.external_context import ExternalContextOrchestrator
orchestrator = ExternalContextOrchestrator(
duckdb_conn=conn,
pg_conn=pg_conn, # optional
repo_path=".",
vector_store=None # optional, for semantic indexing
)
# Sync all sources at once
result = orchestrator.sync_all(
git_config={"since_days": 30, "use_blame": True},
issue_config={
"provider": "github",
"repo": "owner/repo",
"token": token,
},
sentry_config={
"org_slug": "org",
"project_slug": "proj",
"token": sentry_token,
},
parallel=True # run syncs in parallel
)
print(f"Success: {result.success}")
print(f"Items synced: {result.total_items_synced}")
print(f"Tags created: {result.total_tags_created}")
print(f"Edges created: {result.total_edges_created}")
print(f"Duration: {result.duration_seconds:.1f}s")
# Access per-source results
for source, src_result in result.source_results.items():
print(f" {source}: {src_result.items_synced} items, {src_result.tags_created} tags")
# Sync individual sources
git_result = orchestrator.sync_git(since_days=30, use_blame=True)
issue_result = orchestrator.sync_issues(
provider="github", repo="owner/repo", token=token
)
sentry_result = orchestrator.sync_sentry(
org_slug="org", project_slug="proj", token=token
)
# Get sync statistics
stats = orchestrator.get_sync_stats()
CLI Interface¶
# Sync git history
python -m src.services.external_context.orchestrator \
--duckdb /path/to/cpg.duckdb \
--repo-path /path/to/repo \
--sync-git --git-since-days 30
# Sync git without blame analysis
python -m src.services.external_context.orchestrator \
--duckdb /path/to/cpg.duckdb \
--sync-git --no-blame
# Sync git with custom max commits
python -m src.services.external_context.orchestrator \
--duckdb /path/to/cpg.duckdb \
--sync-git --git-max-commits 2000
# Sync GitHub issues
python -m src.services.external_context.orchestrator \
--duckdb /path/to/cpg.duckdb \
--sync-issues --issue-provider github --issue-repo owner/repo \
--issue-token "$GITHUB_TOKEN"
# Sync GitLab issues (self-hosted)
python -m src.services.external_context.orchestrator \
--duckdb /path/to/cpg.duckdb \
--sync-issues --issue-provider gitlab --issue-repo owner/repo \
--issue-url https://gitlab.company.com \
--issue-token "$GITLAB_TOKEN"
# Sync Jira issues
python -m src.services.external_context.orchestrator \
--duckdb /path/to/cpg.duckdb \
--sync-issues --issue-provider jira --issue-project PROJ \
--issue-url https://company.atlassian.net \
--issue-token "$JIRA_TOKEN"
# Sync Sentry errors
python -m src.services.external_context.orchestrator \
--duckdb /path/to/cpg.duckdb \
--sync-sentry --sentry-org my-org --sentry-project my-project \
--sentry-token "$SENTRY_AUTH_TOKEN"
# Sync Sentry (self-hosted)
python -m src.services.external_context.orchestrator \
--duckdb /path/to/cpg.duckdb \
--sync-sentry --sentry-org my-org --sentry-project my-project \
--sentry-token "$SENTRY_AUTH_TOKEN" --sentry-url https://sentry.company.com
# Sync all sources
python -m src.services.external_context.orchestrator \
--duckdb /path/to/cpg.duckdb \
--repo-path /path/to/repo \
--sync-all
# View sync statistics
python -m src.services.external_context.orchestrator \
--duckdb /path/to/cpg.duckdb --stats
# Verbose output
python -m src.services.external_context.orchestrator \
--duckdb /path/to/cpg.duckdb --sync-git --verbose
Full CLI flags reference:
| Flag | Description | Default |
|---|---|---|
--duckdb |
Path to DuckDB database | required |
--pg-url |
PostgreSQL connection URL | None |
--repo-path |
Path to git repository | . |
--sync-all |
Sync all sources | - |
--sync-git |
Sync git history | - |
--sync-issues |
Sync issue tracker | - |
--sync-sentry |
Sync Sentry errors | - |
--git-since-days |
Git lookback period in days | 90 |
--git-max-commits |
Maximum git commits to process | 1000 |
--no-blame |
Skip git blame analysis | - |
--issue-provider |
Issue provider (github, gitlab, jira) |
- |
--issue-repo |
Repository for GitHub/GitLab | - |
--issue-project |
Project key for Jira | - |
--issue-url |
Base URL for self-hosted instances | - |
--issue-token |
Issue tracker API token | - |
--sentry-org |
Sentry organization slug | - |
--sentry-project |
Sentry project slug | - |
--sentry-token |
Sentry auth token | - |
--sentry-url |
Sentry base URL (self-hosted) | - |
--stats |
Show sync statistics | - |
--verbose |
Verbose output | - |
Convenience Functions¶
The src.services.external_context package exports three convenience functions for quick one-off syncs:
from src.services.external_context import (
sync_git_to_cpg,
sync_issues_to_cpg,
sync_sentry_to_cpg,
)
# Sync git metadata
result = sync_git_to_cpg(
duckdb_conn=conn,
pg_conn=pg_conn, # optional
repo_path=".",
since_days=30,
use_blame=True
)
# Sync issues
result = sync_issues_to_cpg(
duckdb_conn=conn,
pg_conn=pg_conn,
provider="github",
repo="owner/repo",
token=os.getenv("GITHUB_TOKEN")
)
# Sync Sentry errors
result = sync_sentry_to_cpg(
duckdb_conn=conn,
pg_conn=pg_conn,
org_slug="my-org",
project_slug="my-project",
token=os.getenv("SENTRY_AUTH_TOKEN")
)
All three return a SyncResult dataclass.
Server-Side API Reference¶
Enums¶
ExternalSource – identifies the external system:
| Value | Description |
|---|---|
GIT |
Git repository |
JIRA |
Jira issue tracker |
GITHUB |
GitHub |
GITLAB |
GitLab |
SENTRY |
Sentry error tracking |
SONARQUBE |
SonarQube code quality |
ContextType – type of external context:
| Value | Description |
|---|---|
COMMIT |
Git commit |
ISSUE |
Issue/ticket |
ERROR |
Error/exception |
METRIC |
Performance metric |
REVIEW |
Code review |
Core Dataclasses¶
SyncResult – result of a sync operation:
| Field | Type | Default | Description |
|---|---|---|---|
source |
ExternalSource |
required | Source system |
context_type |
ContextType |
required | Type of context |
success |
bool |
required | Whether sync succeeded |
items_synced |
int |
0 |
Number of items synced |
items_failed |
int |
0 |
Number of items that failed |
tags_created |
int |
0 |
Number of tags created |
edges_created |
int |
0 |
Number of edges created |
duration_seconds |
float |
0.0 |
Duration of sync |
errors |
List[str] |
field(default_factory=list) |
Error messages |
metadata |
Dict[str, Any] |
field(default_factory=dict) |
Additional metadata |
ExternalTag – a tag to attach to a CPG node:
| Field | Type | Default | Description |
|---|---|---|---|
name |
str |
required | Tag name (e.g., git-author) |
value |
str |
required | Tag value |
external_source |
ExternalSource |
required | Source system |
external_id |
str |
required | ID in external system |
external_url |
Optional[str] |
None |
URL in external system |
confidence |
float |
1.0 |
Confidence of the link |
metadata |
Dict[str, Any] |
field(default_factory=dict) |
Additional metadata |
MethodTagLink – links a method to a tag:
| Field | Type | Description |
|---|---|---|
method_id |
int |
CPG method node ID |
method_full_name |
str |
Fully qualified method name |
filename |
str |
Source file path |
line_start |
int |
Method start line |
line_end |
int |
Method end line |
tag |
ExternalTag |
The tag to attach |
Git Dataclasses¶
GitCommit:
| Field | Type | Default | Description |
|---|---|---|---|
sha |
str |
required | Commit SHA |
author_email |
str |
required | Author email |
author_name |
str |
required | Author name |
timestamp |
datetime |
required | Commit timestamp |
message |
str |
required | Commit message |
files |
List[str] |
required | Changed files |
lines_added |
int |
0 |
Lines added |
lines_deleted |
int |
0 |
Lines deleted |
branch |
Optional[str] |
None |
Branch name |
issue_refs |
List[str] |
None |
Referenced issue IDs |
GitBlameEntry:
| Field | Type | Description |
|---|---|---|
commit_sha |
str |
Commit SHA |
author_email |
str |
Author email |
author_name |
str |
Author name |
timestamp |
datetime |
Commit timestamp |
line_number |
int |
Line number |
line_content |
str |
Line content |
Issue Dataclasses¶
Issue:
| Field | Type | Default | Description |
|---|---|---|---|
id |
str |
required | Issue ID |
title |
str |
required | Issue title |
description |
Optional[str] |
required | Issue description |
issue_type |
str |
required | Type (bug, feature, etc.) |
status |
str |
required | Current status |
priority |
Optional[str] |
required | Priority level |
assignee |
Optional[str] |
required | Assignee |
reporter |
Optional[str] |
required | Reporter |
labels |
List[str] |
required | Labels |
created_at |
Optional[datetime] |
required | Creation timestamp |
updated_at |
Optional[datetime] |
required | Last update timestamp |
url |
Optional[str] |
required | Issue URL |
linked_commits |
List[str] |
required | Linked commit SHAs |
linked_files |
List[str] |
required | Linked file paths |
Sentry Dataclasses¶
SentryIssue:
| Field | Type | Default | Description |
|---|---|---|---|
id |
str |
required | Sentry issue ID |
short_id |
str |
required | Short ID (e.g., PROJ-ABC) |
title |
str |
required | Issue title |
culprit |
str |
required | Culprit (function/file) |
level |
str |
required | Severity level |
status |
str |
required | Issue status |
count |
int |
required | Total event count |
user_count |
int |
required | Affected user count |
first_seen |
datetime |
required | First occurrence |
last_seen |
datetime |
required | Last occurrence |
url |
Optional[str] |
None |
Sentry issue URL |
stacktrace_frames |
List[Dict] |
required | Raw stacktrace frames |
tags |
Dict[str, str] |
required | Sentry tags |
StackFrame:
| Field | Type | Default | Description |
|---|---|---|---|
filename |
str |
required | Source file |
function |
str |
required | Function name |
lineno |
int |
required | Line number |
context_line |
Optional[str] |
None |
Source line content |
in_app |
bool |
True |
Whether frame is in-app code |
ExternalContextBase¶
Abstract base class for all sync services. Located in src/services/external_context/base.py.
class ExternalContextBase(ABC):
def __init__(self, duckdb_conn, pg_conn=None, source: ExternalSource = None):
...
Concrete methods:
| Method | Parameters | Returns | Description |
|---|---|---|---|
create_tag |
tag: ExternalTag |
int |
Creates a tag in nodes_tag_v2, returns tag ID |
create_tag_edge |
method_id: int, tag_id: int |
bool |
Creates edge in edges_tagged_by |
find_methods_by_file_lines |
filename: str, line_start: int, line_end: int |
List[Dict] |
Finds methods overlapping the given line range |
store_external_context |
external_id, context_type, raw_data, linked_files=None, linked_cpg_nodes=None, external_url=None |
bool |
Stores raw context in PostgreSQL |
get_existing_tags |
tag_name: str |
Dict[str, int] |
Returns existing tags by name as {value: id} dict |
Abstract methods (must be implemented by subclasses):
| Method | Returns | Description |
|---|---|---|
sync(**kwargs) |
SyncResult |
Perform the sync operation |
get_supported_tag_categories() |
List[str] |
Return list of tag categories this service creates |
OrchestratorResult¶
Result of an orchestrated multi-source sync.
| Field | Type | Default | Description |
|---|---|---|---|
success |
bool |
True |
Overall success |
total_items_synced |
int |
0 |
Total items across all sources |
total_tags_created |
int |
0 |
Total tags created |
total_edges_created |
int |
0 |
Total edges created |
duration_seconds |
float |
0.0 |
Total duration |
source_results |
Dict[str, SyncResult] |
field(default_factory=dict) |
Per-source results |
errors |
List[str] |
field(default_factory=list) |
Collected errors |
Method: add_result(source: str, result: SyncResult) – adds a source result and updates totals.
Troubleshooting¶
No Tags Created¶
- Verify the CPG database has methods in
nodes_method:sql SELECT COUNT(*) FROM nodes_method; - Check that file paths in the CPG match the repository structure. Path mismatches (e.g., absolute vs. relative) will prevent method-to-commit linking.
- Run sync with
use_blame=True(the default) for git – blame provides line-level precision.
Git Sync Issues¶
# Verify git history is accessible
python -m src.services.external_context.orchestrator \
--duckdb /path/to/cpg.duckdb \
--sync-git --git-since-days 7 --verbose
# Skip blame if it's too slow on large repos
python -m src.services.external_context.orchestrator \
--duckdb /path/to/cpg.duckdb \
--sync-git --no-blame
GitHub Rate Limiting¶
Use an authenticated token for higher rate limits:
service = IssueSyncService(
duckdb_conn=conn,
provider="github",
repo="owner/repo",
token=os.getenv("GITHUB_TOKEN")
)
Sentry API Errors¶
- Verify
org_slugandproject_slugare correct (check Sentry dashboard URL) - Check the token has
project:readandevent:readscopes - Try a smaller date range with
--git-since-days 7to confirm connectivity - For self-hosted Sentry, set
--sentry-urlto your instance URL
Tags Exist but No Edges¶
If tags appear in nodes_tag_v2 but there are no edges in edges_tagged_by, it typically means file paths in the CPG don’t match the paths returned by git/Sentry. Check:
-- Inspect tag values
SELECT name, value, COUNT(*) FROM nodes_tag_v2 GROUP BY name, value LIMIT 20;
-- Check edge count
SELECT COUNT(*) FROM edges_tagged_by;
Next Steps¶
- SQL Query Cookbook – more SQL examples for CPG queries
- Onboarding Scenario – using external context in natural language queries
- Architecture – system architecture details