Integration guide for connecting CodeGraph with GitLab (self-managed or GitLab.com) for automated code review, security scanning, and incremental CPG updates.
Table of Contents¶
- Overview
- Prerequisites
- Webhook Configuration
- CI Pipeline (.gitlab-ci.yml)
- REST API Endpoints
- MCP Tools
- Incremental CPG Updates
- Docker Image Setup
- Troubleshooting
- Next Steps
Overview¶
CodeGraph integrates with GitLab for automated code review, security scanning, and technical debt tracking. Three integration paths:
- CI/CD Pipeline – CodeGraph as a step in
.gitlab-ci.yml - Webhook-driven – push/MR events trigger incremental CPG updates and reviews automatically
- Standalone – deployed alongside GitLab, accessed via REST API / CLI / MCP
GitLab webhook payloads use GitLab-specific format (object_attributes for MR, project for repository info). CodeGraph normalizes these into unified models shared across all 4 supported platforms.
Prerequisites¶
- CodeGraph instance (Docker or standalone) accessible from GitLab CI runners
- GitLab project with Maintainer or Owner permissions (for webhook setup)
- Docker (for CI pipeline steps)
- LLM API key:
YANDEX_API_KEY,GIGACHAT_AUTH_KEY, orOPENAI_API_KEY
Webhook Configuration¶
Endpoint: POST /api/v1/webhooks/gitlab (returns 202 Accepted)
GitLab setup: Project Settings > Webhooks > Add new webhook:
1. URL: https://<codegraph-host>/api/v1/webhooks/gitlab
2. Secret token: any string (plain token verification, not HMAC)
3. Trigger: check Push events and Merge request events
4. SSL verification: enable if CodeGraph uses a valid certificate
Token Verification¶
GitLab uses plain secret token verification (not HMAC-SHA256). CodeGraph compares the X-Gitlab-Token header against the configured secret using constant-time comparison.
Error codes:
- 401 Unauthorized – missing X-Gitlab-Token header
- 403 Forbidden – token does not match configured secret
Note: Unlike GitHub which uses HMAC-SHA256 over the request body, GitLab sends the secret token directly in the
X-Gitlab-Tokenheader. This means the request body is not signed and there is no timestamp header for replay attack protection. For additional security, use IP allowlisting or a reverse proxy with mTLS.
Supported Events¶
Event Header (X-Gitlab-Event) |
Action |
|---|---|
Push Hook / push |
Incremental CPG update via CPGUpdateQueue |
Merge Request Hook / merge_request |
Code review workflow |
Events not matching these types are returned with {"status": "skipped"}.
Configuration¶
Section gitlab in config.yaml:
gitlab:
webhook_secret: "your-gitlab-webhook-token"
auto_update_on_push: true # trigger incremental CPG update on push events
api_url: "https://gitlab.com" # or your self-managed instance URL
Event Type Detection¶
CodeGraph detects event type from the X-Gitlab-Event header. If the header is missing, it falls back to payload structure detection:
object_attributesormerge_requestkey present →merge_requesteventcommitsor (before+after) present →pushevent
Payload Structure¶
Push Hook payload (key fields used by CodeGraph):
{
"ref": "refs/heads/main",
"before": "old_commit_sha",
"after": "new_commit_sha",
"project": {
"git_http_url": "https://gitlab.com/org/repo.git",
"path_with_namespace": "org/repo"
},
"user_name": "developer",
"commits": [
{
"id": "new_commit_sha",
"message": "feat: add feature",
"author": {"name": "Developer"},
"added": ["src/new.py"],
"modified": ["src/existing.py"],
"removed": []
}
]
}
Merge Request Hook payload (key fields):
{
"object_attributes": {
"iid": 15,
"action": "open",
"source_branch": "feature-x",
"target_branch": "main",
"title": "Add feature X"
},
"user": {"username": "developer"},
"project": {
"git_http_url": "https://gitlab.com/org/repo.git",
"path_with_namespace": "org/repo"
}
}
Action mapping: open → opened, update → updated, close → closed, merge → merged, reopen → reopened.
CI Pipeline (.gitlab-ci.yml)¶
Basic Pipeline¶
Create .gitlab-ci.yml:
stages:
- analyze
- review
- report
variables:
PROJECT_LANGUAGE: python
cpg-update:
stage: analyze
image: ghcr.io/mkhlsavin/codegraph:latest
script:
- gocpg parse --input=. --output=/tmp/cpg.duckdb --lang=${PROJECT_LANGUAGE}
artifacts:
paths:
- /tmp/cpg.duckdb
expire_in: 1 hour
only:
- merge_requests
- main
security-scan:
stage: review
image: ghcr.io/mkhlsavin/codegraph:latest
needs: [cpg-update]
script:
- |
DIFF=$(git diff ${CI_MERGE_REQUEST_DIFF_BASE_SHA}...HEAD)
curl -sf -X POST "${CODEGRAPH_URL}/api/v1/security/scan-diff" \
-H "Authorization: Bearer ${CODEGRAPH_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"diff_content\": $(echo "$DIFF" | python3 -c 'import sys,json; print(json.dumps(sys.stdin.read()))'), \"output_format\": \"sarif\"}" \
-o gl-sast-report.json
artifacts:
reports:
sast: gl-sast-report.json
only:
- merge_requests
mr-review:
stage: review
image: ghcr.io/mkhlsavin/codegraph:latest
needs: [cpg-update]
script:
- |
DIFF=$(git diff ${CI_MERGE_REQUEST_DIFF_BASE_SHA}...HEAD)
curl -sf -X POST "${CODEGRAPH_URL}/api/v1/review/summary" \
-H "Authorization: Bearer ${CODEGRAPH_TOKEN}" \
-H "Content-Type: application/json" \
-d "{\"diff_content\": $(echo "$DIFF" | python3 -c 'import sys,json; print(json.dumps(sys.stdin.read()))'), \"title\": \"${CI_MERGE_REQUEST_TITLE}\"}" \
-o review-results.json
cat review-results.json
only:
- merge_requests
report:
stage: report
image: ghcr.io/mkhlsavin/codegraph:latest
needs: [security-scan, mr-review]
script:
- echo "Security scan and review completed"
- "[ -f gl-sast-report.json ] && cat gl-sast-report.json | python3 -c \"import sys,json; r=json.load(sys.stdin); print(f'Findings: {r.get(\\\"total_count\\\", 0)}')\""
only:
- merge_requests
Required CI/CD Variables¶
Configure in Settings > CI/CD > Variables:
| Name | Type | Description |
|---|---|---|
CODEGRAPH_URL |
Variable | CodeGraph API endpoint (e.g. https://codegraph.example.com) |
CODEGRAPH_TOKEN |
Secret (masked) | CodeGraph API authentication token |
PROJECT_LANGUAGE |
Variable | Source language (default: python) |
SAST Integration¶
CodeGraph’s SARIF output integrates with GitLab Security Dashboard. Use the artifacts:reports:sast key to upload findings:
artifacts:
reports:
sast: gl-sast-report.json
Findings appear in Security & Compliance > Vulnerability Report (GitLab Ultimate) or in the MR widget (all tiers for SAST).
REST API Endpoints¶
POST /api/v1/webhooks/gitlab – Webhook Receiver¶
POST /api/v1/security/scan-diff – Security Scan¶
Scans a raw unified diff for security vulnerabilities.
Request (ScanDiffRequest):
{
"diff_content": "unified diff string",
"output_format": "sarif"
}
| Field | Type | Default | Description |
|---|---|---|---|
diff_content |
string | – | Raw unified diff (required) |
output_format |
string | "json" |
"json" or "sarif" |
Response (ScanDiffResponse):
{
"findings": [
{
"finding_id": "f-001",
"pattern_id": "CWE-89",
"pattern_name": "SQL Injection",
"category": "security",
"severity": "high",
"method_name": "execute_query",
"filename": "src/db.py",
"line_number": 42,
"description": "User input concatenated into SQL query",
"cwe_ids": ["CWE-89"],
"confidence": 0.92
}
],
"sarif": {},
"new_count": 1,
"fixed_count": 0,
"total_count": 1,
"critical_count": 0,
"high_count": 1,
"medium_count": 0,
"low_count": 0
}
POST /api/v1/review/summary – MR Summary¶
Generates a structured summary from any unified diff (platform-agnostic).
{
"diff_content": "unified diff string",
"title": "Optional MR title",
"description": "Optional MR description"
}
Response: summary, changed_files, additions, deletions, changed_methods, risk_areas.
POST /api/v1/review/commit-message – Commit Message¶
Generates a conventional commit message from a diff.
{
"diff_content": "unified diff string"
}
Response: message, type, scope, files_changed.
POST /api/v1/review/mr – GitLab MR Review¶
Reviews a GitLab MR directly by fetching the diff from GitLab API (no need to pass raw diff).
{
"project_id": "org/repo",
"mr_iid": 15,
"gitlab_url": "https://gitlab.com",
"task_description": "Optional review focus",
"dod_items": ["No security issues", "Tests pass"]
}
Response: recommendation, score, findings[], dod_validation[], summary, processing_time_ms, request_id.
GET /api/v1/webhooks/status/{project_id} – Update Status¶
Returns the latest CPG update pipeline status for a project.
{
"project_id": "org/repo",
"last_update": "2026-03-09T12:00:00+00:00",
"status": "completed",
"commit_sha": "ddd444",
"duration_ms": 2345,
"gocpg_status": "completed",
"chromadb_synced": 3,
"error": null
}
| Field | Type | Description |
|---|---|---|
status |
string | completed, failed, in_progress, or unknown |
last_update |
string | ISO 8601 timestamp of last completed update |
duration_ms |
int | Pipeline execution time in milliseconds |
gocpg_status |
string | GoCPG incremental update status |
chromadb_synced |
int | Number of ChromaDB collections updated |
MCP Tools¶
When CodeGraph runs as an MCP server (python -m src.mcp), AI agents can use CPG-aware tools:
| Tool | Description |
|---|---|
codegraph_project_context |
Project overview: CPG stats, languages, hotspots, security findings |
codegraph_file_context |
Per-file methods, metrics, callers, callees, dead code, security |
codegraph_diff_context |
Blast radius analysis for changed files |
codegraph_query |
Execute raw SQL against CPG DuckDB |
codegraph_find_callers |
Find all callers of a function |
codegraph_find_callees |
Find all functions called by a function |
codegraph_taint_analysis |
Trace data flows from sources to sinks |
These tools are available when CodeGraph runs as an MCP server (python -m src.mcp).
Incremental CPG Updates¶
When a push event is received via webhook, CodeGraph runs the full incremental update pipeline:
- Resolve project – match by
project_id, repository URL (git_http_url), or active project - GoCPG Update – incremental CPG update via gRPC/subprocess (
from_commit→to_commit) - ChromaDB Sync – update vector store collections for changed files
- WebSocket Notification – broadcast
cpg.update.completeevent to connected clients
Pipeline behavior:
- Duplicate push events (same project + commit SHA) are deduplicated
- Rapid pushes to the same project are coalesced (merged changed files, latest commit)
- GoCPG failure triggers automatic retry after 60 seconds
- ChromaDB sync continues even if GoCPG update fails
- Webhook always returns 202 Accepted immediately (processing is asynchronous)
Monitor pipeline status via GET /api/v1/webhooks/status/{project_id}.
Docker Image Setup¶
Images published to GHCR via .github/workflows/publish-ghcr.yml, tagged on version tags (v*) and latest.
docker pull ghcr.io/mkhlsavin/codegraph:latest
docker run --rm ghcr.io/mkhlsavin/codegraph:latest python -m src.cli health
Includes GoCPG binary and supports 11 languages (C, C++, C#, Go, Java, JavaScript, Kotlin, PHP, Python, Ruby, TypeScript).
For self-managed GitLab with private container registry:
docker tag ghcr.io/mkhlsavin/codegraph:latest registry.company.com/codegraph:latest
docker push registry.company.com/codegraph:latest
Then reference registry.company.com/codegraph:latest in .gitlab-ci.yml.
Troubleshooting¶
Webhook Token Error (401/403)¶
401 Unauthorized – X-Gitlab-Token header is missing:
- Verify the secret token is configured in GitLab webhook settings
- Ensure the webhook URL is correct (/api/v1/webhooks/gitlab, not /api/v1/webhooks/github)
403 Forbidden – token does not match:
- Verify the token in GitLab matches config.yaml > gitlab.webhook_secret
- Tokens are compared using constant-time comparison; check for trailing whitespace or newlines
Webhook Returns 400¶
- Check that the payload is valid JSON
- Ensure Content-Type is
application/json(GitLab sets this by default)
CI Pipeline Cannot Reach CodeGraph¶
CODEGRAPH_URLmust be reachable from GitLab CI runners- Verify:
curl -sf ${CODEGRAPH_URL}/api/v1/health - For self-managed GitLab: check network rules between runner and CodeGraph
- For shared runners (GitLab.com): CodeGraph must be publicly accessible or use a tunnel
SAST Report Not Appearing¶
- Verify the artifact path matches:
artifacts:reports:sast: gl-sast-report.json - Use
output_format: "sarif"in the scan-diff request - SAST reports require the MR to be open (not merged)
MR Review Returns Empty Results¶
- Verify
CI_MERGE_REQUEST_DIFF_BASE_SHAis available (requiresmerge_requeststrigger) - Check that
GIT_DEPTH: "0"is set invariables:for full git history (GitLab CI equivalent offetch-depth: 0) - Test API token:
curl -sf ${CODEGRAPH_URL}/api/v1/health -H "Authorization: Bearer ${CODEGRAPH_TOKEN}"
Self-Managed GitLab Certificate Issues¶
If CodeGraph cannot reach a self-managed GitLab instance with a self-signed certificate:
- Mount the CA certificate into the CodeGraph container
- Set SSL_CERT_FILE environment variable to the CA bundle path
Incremental Update Shows “unavailable”¶
- Verify GoCPG is installed and accessible:
gocpg --version - Check that
GOCPG_PATHenvironment variable points to the GoCPG binary - Review logs:
grep "GoCPG" /var/log/codegraph/*.log
Next Steps¶
- GitHub Integration – GitHub webhook and CI integration
- SourceCraft Integration – Yandex SourceCraft integration
- GitVerse Integration – SberTech GitVerse integration
- Configuration – Full configuration reference
- REST API docs – Complete API documentation