Architecture Guide for DevOps and Infrastructure Teams
Table of Contents¶
- Overview
- 1. System Requirements
- 1.1 Hardware Requirements
- 1.2 Software Requirements
- 1.3 Network Ports
- 2. Docker Compose (Development)
- 2.1 File Structure
- 2.2 docker-compose.yml
- 2.3 Environment Variables (.env)
- 2.4 Startup
- 3. Kubernetes (Production)
- 3.1 Architecture
- 3.2 Namespace and ConfigMap
- 3.3 Secrets
- 3.4 Deployment
- 3.5 Service and Ingress
- 3.6 HorizontalPodAutoscaler
- 3.7 NetworkPolicy
- 4. Air-Gapped Deployment
- 4.1 Isolated Environment Characteristics
- 4.2 Air-Gapped Configuration
- 4.3 Preparing Artifacts for Air-Gapped
- 4.4 Installation in Air-Gapped Environment
- 5. Deployment Security
- 5.1 TLS/SSL Configuration
- 5.2 Pod Security Standards
- 5.3 Secrets Encryption
- 6. Monitoring and Observability
- 6.1 Prometheus ServiceMonitor
- 6.2 Grafana Dashboard
- 6.3 Alertmanager Rules
- 7. Backup and Recovery
- 7.1 PostgreSQL Backup
- 7.2 DuckDB Backup
- 7.3 Disaster Recovery
- 8. Migration and Upgrades
- 8.1 Rolling Update
- 8.2 Database Migration
- Related Documents
Overview¶
CodeGraph supports multiple deployment modes for different security and scaling requirements:
| Mode | Description | Recommended For |
|---|---|---|
| Docker Compose | Single node, simple setup | Development, testing |
| Kubernetes | Clustered, HA, auto-scaling | Production |
| Air-Gapped | Isolated network, local LLM | High-security environments |
1. System Requirements¶
1.1 Hardware Requirements¶
| Component | Minimum | Recommended | Production |
|---|---|---|---|
| CPU | 4 cores | 8 cores | 16+ cores |
| RAM | 8 GB | 16 GB | 32+ GB |
| SSD | 50 GB | 100 GB | 500+ GB |
| GPU | - | NVIDIA 8GB+ | NVIDIA 24GB+ |
Note: GPU is only required for local LLM (air-gapped mode).
1.2 Software Requirements¶
| Component | Version | Purpose |
|---|---|---|
| Python | 3.11+ | Main runtime environment |
| PostgreSQL | 16+ | User, session, audit storage |
| DuckDB | 1.4.4+ | CPG graph storage (DuckPGQ extension) |
| Go | 1.26+ | GoCPG CPG generator (CGO required) |
| Docker | 24+ | Containerization (optional) |
| Kubernetes | 1.28+ | Orchestration (production) |
Note: GoCPG is used for CPG generation from source code, producing DuckDB output directly. Requires Go 1.26+ with CGO enabled for tree-sitter parsing.
1.3 Network Ports¶
| Port | Service | Protocol |
|---|---|---|
| 8000 | CodeGraph API | HTTP/HTTPS |
| 5432 | PostgreSQL | TCP |
| 514 | SIEM (Syslog) | UDP/TCP |
| 8200 | HashiCorp Vault | HTTP/HTTPS |
| 9090 | Prometheus | HTTP |
| 9093 | Alertmanager | HTTP |
| 3000 | Grafana | HTTP |
2. Docker Compose (Development)¶
2.1 File Structure¶
codegraph/
├── docker-compose.yml
├── docker-compose.override.yml # Local settings
├── .env # Environment variables
├── config.yaml # Application configuration
├── monitoring/
│ ├── prometheus.yml # Prometheus config
│ ├── alertmanager.yml # Alertmanager config
│ ├── rules/ # Alert rules
│ ├── grafana-datasources.yml # Grafana datasources
│ └── grafana-dashboards.yml # Grafana dashboard provisioning
└── data/
├── duckdb/ # DuckDB files
└── projects/ # Project workspaces
2.2 docker-compose.yml¶
The compose file defines 6 services:
services:
# ==========================================================================
# CodeGraph API Server
# ==========================================================================
api:
build:
context: .
dockerfile: Dockerfile
target: production
image: codegraph:latest
container_name: codegraph-api
restart: unless-stopped
ports:
- "8000:8000"
environment:
# Database
- DATABASE_URL=postgresql+asyncpg://codegraph:${POSTGRES_PASSWORD}@postgres:5432/codegraph
# Authentication
- API_JWT_SECRET=${API_JWT_SECRET}
- API_ADMIN_USERNAME=${API_ADMIN_USERNAME:-admin}
- API_ADMIN_PASSWORD=${API_ADMIN_PASSWORD}
# Yandex AI Studio LLM
- YANDEX_API_KEY=${YANDEX_API_KEY}
- YANDEX_FOLDER_ID=${YANDEX_FOLDER_ID}
# Environment
- ENVIRONMENT=production
- LOG_LEVEL=${LOG_LEVEL:-INFO}
# Security
- SECURITY_ENABLED=${SECURITY_ENABLED:-true}
- DLP_ENABLED=${DLP_ENABLED:-true}
# CORS (production)
- CORS_ALLOWED_ORIGINS=${CORS_ALLOWED_ORIGINS:-}
volumes:
- ./config.yaml:/app/config.yaml:ro
- ./data/projects:/app/data/projects
- ./data/duckdb:/app/data/duckdb
- ./logs:/app/logs
depends_on:
postgres:
condition: service_healthy
networks:
- codegraph-network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/api/v1/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
# Note: Uses 1 Uvicorn worker (not config.yaml api.workers) for in-memory cache coherence
deploy:
resources:
limits:
cpus: '4'
memory: 8G
reservations:
cpus: '2'
memory: 4G
# ==========================================================================
# PostgreSQL Database
# ==========================================================================
postgres:
image: postgres:17-alpine
container_name: codegraph-postgres
restart: unless-stopped
environment:
- POSTGRES_USER=codegraph
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=codegraph
- PGDATA=/var/lib/postgresql/data/pgdata
volumes:
- postgres-data:/var/lib/postgresql/data
networks:
- codegraph-network
healthcheck:
test: ["CMD-SHELL", "pg_isready -U codegraph -d codegraph"]
interval: 10s
timeout: 5s
retries: 5
start_period: 10s
deploy:
resources:
limits:
cpus: '2'
memory: 2G
# ==========================================================================
# Prometheus Monitoring
# ==========================================================================
prometheus:
image: prom/prometheus:v3.10.0
container_name: codegraph-prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./monitoring/rules:/etc/prometheus/rules:ro
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.enable-lifecycle'
- '--storage.tsdb.retention.time=30d'
networks:
- codegraph-network
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:9090/-/healthy"]
interval: 30s
timeout: 10s
retries: 3
# ==========================================================================
# Alertmanager
# ==========================================================================
alertmanager:
image: prom/alertmanager:v0.31.1
container_name: codegraph-alertmanager
restart: unless-stopped
ports:
- "9093:9093"
environment:
- ALERTMANAGER_WEBHOOK_URL=${ALERTMANAGER_WEBHOOK_URL:-http://localhost:9095/alert}
volumes:
- ./monitoring/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
networks:
- codegraph-network
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://localhost:9093/-/healthy"]
interval: 30s
timeout: 10s
retries: 3
# ==========================================================================
# Grafana Dashboard
# ==========================================================================
grafana:
image: grafana/grafana:12.4.0
container_name: codegraph-grafana
restart: unless-stopped
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_USER=${GRAFANA_ADMIN_USER:-admin}
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD}
- GF_USERS_ALLOW_SIGN_UP=false
- GF_SERVER_ROOT_URL=${GRAFANA_ROOT_URL:-http://localhost:3000}
volumes:
- grafana-data:/var/lib/grafana
- ./monitoring/grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml:ro
- ./monitoring/grafana-dashboards.yml:/etc/grafana/provisioning/dashboards/dashboards.yml:ro
depends_on:
- prometheus
networks:
- codegraph-network
healthcheck:
test: ["CMD-SHELL", "wget -q --spider http://localhost:3000/api/health || exit 1"]
interval: 30s
timeout: 10s
retries: 3
# ==========================================================================
# GoCPG - Code Property Graph Generator
# ==========================================================================
# Usage:
# docker compose run --rm gocpg parse --input=/src --output=/out/cpg.duckdb --lang=c
gocpg:
build:
context: ./gocpg
dockerfile: Dockerfile
image: gocpg:latest
container_name: codegraph-gocpg
volumes:
- ${GOCPG_SOURCE_PATH:-.}:/src:ro
- ./data/duckdb:/out
networks:
- codegraph-network
deploy:
resources:
limits:
memory: 4G
networks:
codegraph-network:
driver: bridge
ipam:
config:
- subnet: 172.28.0.0/16
volumes:
postgres-data:
driver: local
prometheus-data:
driver: local
grafana-data:
driver: local
2.3 Environment Variables (.env)¶
# =============================================================================
# CodeGraph Environment Variables
# =============================================================================
# PostgreSQL
POSTGRES_PASSWORD=<secure-password-here>
# JWT Authentication
API_JWT_SECRET=<64-char-random-string>
API_ADMIN_USERNAME=admin
API_ADMIN_PASSWORD=<secure-admin-password>
# LLM Providers
# Yandex AI Studio
YANDEX_API_KEY=<your-yandex-api-key>
YANDEX_FOLDER_ID=<your-yandex-folder-id>
# GigaChat (Sber) — alternative provider
# GIGACHAT_AUTH_KEY=<base64-encoded-credentials>
# Security Features
SECURITY_ENABLED=true
DLP_ENABLED=true
# Grafana
GRAFANA_ADMIN_PASSWORD=<secure-grafana-password>
# Alertmanager webhook (optional)
# ALERTMANAGER_WEBHOOK_URL=http://localhost:9095/alert
2.4 Startup¶
# Create .env file
cp .env.example .env
# Edit variables
nano .env
# Start all services
docker compose up -d
# Check status
docker compose ps
# View logs
docker compose logs -f api
# Initialize database
docker compose exec api python -m alembic upgrade head
# Create administrator
docker compose exec api python -m src.api.cli create-admin \
--username admin --password <secure-password>
3. Kubernetes (Production)¶
3.1 Architecture¶
┌─────────────────────────────────────────────────────────────────────────────┐
│ KUBERNETES CLUSTER │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ INGRESS CONTROLLER │ │
│ │ (nginx / traefik / istio) │ │
│ │ │ │
│ │ api.codegraph.company.com ────────────────► codegraph-api:8000 │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CODEGRAPH NAMESPACE │ │
│ │ │ │
│ │ ┌───────────────────┐ ┌───────────────────┐ ┌─────────────────┐ │ │
│ │ │ codegraph-api │ │ codegraph-api │ │ codegraph-api │ │ │
│ │ │ (replica 1) │ │ (replica 2) │ │ (replica 3) │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ CPU: 2 │ │ CPU: 2 │ │ CPU: 2 │ │ │
│ │ │ RAM: 4Gi │ │ RAM: 4Gi │ │ RAM: 4Gi │ │ │
│ │ └───────────────────┘ └───────────────────┘ └─────────────────┘ │ │
│ │ │ │ │ │ │
│ │ └────────────────────┼──────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌───────────────────────────────────────────────────────────────┐ │ │
│ │ │ SERVICES │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │ │
│ │ │ │ PostgreSQL │ │ GoCPG │ │ HashiCorp Vault │ │ │ │
│ │ │ │ (StatefulSet)│ │ (StatefulSet)│ │ (External) │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ PVC: 100Gi │ │ PVC: 50Gi │ │ │ │ │ │
│ │ │ └─────────────┘ └─────────────┘ └─────────────────────┘ │ │ │
│ │ │ │ │ │
│ │ └───────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
3.2 Namespace and ConfigMap¶
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: codegraph
labels:
name: codegraph
istio-injection: enabled # If using Istio
pod-security.kubernetes.io/enforce: restricted
---
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: codegraph-config
namespace: codegraph
data:
config.yaml: |
domain:
name: postgresql
auto_activate: true
api:
host: "0.0.0.0"
port: 8000
workers: 1 # Single worker for in-memory cache coherence (TTLCache)
security:
enabled: true
dlp:
enabled: true
pre_request:
enabled: true
default_action: "WARN"
post_response:
enabled: true
default_action: "MASK"
siem:
enabled: true
syslog:
enabled: true
host: "siem-syslog.security.svc.cluster.local"
port: 514
vault:
enabled: true
url: "http://vault.vault.svc.cluster.local:8200"
auth_method: "kubernetes"
kubernetes:
role: "codegraph"
3.3 Secrets¶
# secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: codegraph-secrets
namespace: codegraph
type: Opaque
stringData:
DATABASE_URL: "postgresql+asyncpg://codegraph:password@postgres:5432/codegraph"
API_JWT_SECRET: "<64-char-random-string>"
YANDEX_API_KEY: "<your-yandex-api-key>"
YANDEX_FOLDER_ID: "<your-yandex-folder-id>"
---
# For production use External Secrets Operator + Vault
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: codegraph-vault-secrets
namespace: codegraph
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: codegraph-secrets
data:
- secretKey: DATABASE_URL
remoteRef:
key: codegraph/database
property: url
- secretKey: API_JWT_SECRET
remoteRef:
key: codegraph/api
property: jwt_secret
- secretKey: YANDEX_API_KEY
remoteRef:
key: codegraph/llm
property: yandex_api_key
3.4 Deployment¶
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: codegraph-api
namespace: codegraph
labels:
app: codegraph
component: api
spec:
replicas: 3
selector:
matchLabels:
app: codegraph
component: api
template:
metadata:
labels:
app: codegraph
component: api
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/api/v1/metrics"
spec:
serviceAccountName: codegraph
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: api
image: codegraph:latest
imagePullPolicy: Always
ports:
- containerPort: 8000
name: http
envFrom:
- secretRef:
name: codegraph-secrets
env:
- name: SECURITY_ENABLED
value: "true"
- name: DLP_ENABLED
value: "true"
volumeMounts:
- name: config
mountPath: /app/config.yaml
subPath: config.yaml
- name: duckdb-data
mountPath: /app/data/duckdb
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
livenessProbe:
httpGet:
path: /api/v1/health/live
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /api/v1/health/ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumes:
- name: config
configMap:
name: codegraph-config
- name: duckdb-data
persistentVolumeClaim:
claimName: codegraph-duckdb-pvc
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: codegraph
topologyKey: kubernetes.io/hostname
3.5 Service and Ingress¶
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: codegraph-api
namespace: codegraph
spec:
type: ClusterIP
ports:
- port: 8000
targetPort: 8000
protocol: TCP
name: http
selector:
app: codegraph
component: api
---
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: codegraph-ingress
namespace: codegraph
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.codegraph.company.com
secretName: codegraph-tls
rules:
- host: api.codegraph.company.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: codegraph-api
port:
number: 8000
3.6 HorizontalPodAutoscaler¶
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: codegraph-api-hpa
namespace: codegraph
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: codegraph-api
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
3.7 NetworkPolicy¶
# networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: codegraph-network-policy
namespace: codegraph
spec:
podSelector:
matchLabels:
app: codegraph
policyTypes:
- Ingress
- Egress
ingress:
# Allow traffic from ingress controller
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8000
# Allow traffic from Prometheus
- from:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 8000
egress:
# PostgreSQL
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
# Vault
- to:
- namespaceSelector:
matchLabels:
name: vault
ports:
- protocol: TCP
port: 8200
# SIEM (Syslog)
- to:
- namespaceSelector:
matchLabels:
name: security
ports:
- protocol: UDP
port: 514
# LLM APIs (GigaChat, Yandex AI, OpenAI)
- to:
- ipBlock:
cidr: 0.0.0.0/0
ports:
- protocol: TCP
port: 443
# DNS
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
4. Air-Gapped Deployment¶
4.1 Isolated Environment Characteristics¶
┌────────────────────────────────────────────────────────────────────────┐
│ AIR-GAPPED ENVIRONMENT │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ NO INTERNET ACCESS │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │
│ │ │ CodeGraph │ │ Local LLM │ │ Local Container │ │ │
│ │ │ API │ │ (llama.cpp) │ │ Registry │ │ │
│ │ │ │──│ │ │ │ │ │
│ │ │ DLP: ON │ │ │ │ registry.local:5000 │ │ │
│ │ │ SIEM: ON │ │ │ │ │ │ │
│ │ │ Vault: ON │ │ │ │ │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ NO EXTERNAL LLM APIs │
│ ALL DATA STAYS ON-PREMISE │
│ │
└────────────────────────────────────────────────────────────────────────┘
4.2 Air-Gapped Configuration¶
# config.yaml for air-gapped
llm:
# Use local LLM instead of GigaChat
provider: "local"
# Online mode: Qwen3-235B via Yandex AI Studio
yandex:
model: "qwen3-235b"
# Offline mode: Local model via QWEN3_MODEL_PATH env var
local:
# Path to model (transferred on media or set QWEN3_MODEL_PATH env var)
model_path: "${QWEN3_MODEL_PATH:-/models/model.gguf}"
n_ctx: 8192
n_gpu_layers: -1 # All layers on GPU
n_batch: 512
n_threads: 8
security:
enabled: true
# DLP works locally
dlp:
enabled: true
webhook:
enabled: false # No external webhooks
# SIEM — local server
siem:
enabled: true
syslog:
enabled: true
host: "siem.local"
port: 514
# Vault — local instance
vault:
enabled: true
url: "http://vault.local:8200"
auth_method: "approle"
4.3 Preparing Artifacts for Air-Gapped¶
#!/bin/bash
# prepare-airgapped.sh — run on internet-connected machine
# 1. Download Docker images
docker pull codegraph:latest
docker pull postgres:17-alpine
docker pull gocpg:latest
docker pull hashicorp/vault:1.15
# 2. Save images to tar
docker save codegraph:latest postgres:17-alpine \
gocpg:latest hashicorp/vault:1.15 \
| gzip > codegraph-images.tar.gz
# 3. Download LLM model
wget https://huggingface.co/...
# 4. Download Python dependencies
pip download -d ./packages -r requirements.txt
# 5. Package everything
tar -czvf codegraph-airgapped-bundle.tar.gz \
codegraph-images.tar.gz \
model.gguf \
packages/ \
config/ \
scripts/
4.4 Installation in Air-Gapped Environment¶
#!/bin/bash
# install-airgapped.sh — run in isolated environment
# 1. Extract bundle
tar -xzvf codegraph-airgapped-bundle.tar.gz
# 2. Load Docker images
gunzip -c codegraph-images.tar.gz | docker load
# 3. Install Python dependencies from local cache
pip install --no-index --find-links=./packages -r requirements.txt
# 4. Copy model
cp model.gguf /models/
# 5. Start services
docker compose -f docker-compose.airgapped.yml up -d
5. Deployment Security¶
5.1 TLS/SSL Configuration¶
# Nginx Ingress with TLS 1.3
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: codegraph-ingress
annotations:
nginx.ingress.kubernetes.io/ssl-protocols: "TLSv1.3"
nginx.ingress.kubernetes.io/ssl-ciphers: "TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256"
nginx.ingress.kubernetes.io/configuration-snippet: |
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-Frame-Options "DENY" always;
add_header X-XSS-Protection "1; mode=block" always;
5.2 Pod Security Standards¶
Kubernetes 1.28+ uses Pod Security Admission (PSA) instead of the deprecated PodSecurityPolicy. Apply the restricted security profile via namespace labels:
# namespace.yaml with Pod Security Standards
apiVersion: v1
kind: Namespace
metadata:
name: codegraph
labels:
name: codegraph
# Pod Security Admission (replaces deprecated PodSecurityPolicy)
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
The restricted profile enforces:
- runAsNonRoot: true
- allowPrivilegeEscalation: false
- Drop ALL capabilities
- Read-only root filesystem
- No host network/PID/IPC
These are already configured in the Deployment spec (section 3.4) via securityContext.
5.3 Secrets Encryption¶
# EncryptionConfiguration for etcd
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-32-byte-key>
- identity: {}
6. Monitoring and Observability¶
6.1 Prometheus ServiceMonitor¶
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: codegraph-monitor
namespace: codegraph
spec:
selector:
matchLabels:
app: codegraph
endpoints:
- port: http
path: /api/v1/metrics
interval: 30s
6.2 Grafana Dashboard¶
CodeGraph exposes Prometheus metrics with the rag_ prefix via /api/v1/metrics:
| Panel | Metric | Description |
|---|---|---|
| Active Requests | rag_active_requests |
In-flight requests (gauge) |
| Total Requests | rate(rag_total_requests[5m]) |
Requests per second |
| Latency P95 | histogram_quantile(0.95, rate(rag_scenario_duration_seconds_bucket[5m])) |
95th percentile latency |
| LLM Latency | histogram_quantile(0.95, rate(rag_llm_latency_seconds_bucket[5m])) |
LLM API latency |
| LLM Tokens | sum(rate(rag_llm_tokens_total[1h])) by (model) |
Token usage by model |
| LLM Errors | rate(rag_llm_errors_total[5m]) |
LLM API errors |
| Cache Hit Rate | rate(rag_cache_hits_total[5m]) / (rate(rag_cache_hits_total[5m]) + rate(rag_cache_misses_total[5m])) |
Cache effectiveness |
6.3 Alertmanager Rules¶
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: codegraph-alerts
namespace: codegraph
spec:
groups:
- name: codegraph
rules:
- alert: CodeGraphHighLatency
expr: histogram_quantile(0.95, rate(rag_scenario_duration_seconds_bucket[5m])) > 30
for: 5m
labels:
severity: critical
annotations:
summary: "High P95 latency on CodeGraph API"
- alert: CodeGraphLLMErrors
expr: rate(rag_llm_errors_total[5m]) > 0.1
for: 10m
labels:
severity: warning
annotations:
summary: "Elevated LLM API error rate"
- alert: CodeGraphPodNotReady
expr: kube_pod_status_ready{namespace="codegraph", condition="true"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "CodeGraph pod not ready"
7. Backup and Recovery¶
7.1 PostgreSQL Backup¶
#!/bin/bash
# backup-postgres.sh
BACKUP_DIR="/backups/postgres"
DATE=$(date +%Y%m%d_%H%M%S)
NAMESPACE="codegraph"
# Create backup
kubectl exec -n $NAMESPACE postgres-0 -- \
pg_dump -U codegraph codegraph | gzip > $BACKUP_DIR/codegraph_$DATE.sql.gz
# Remove old backups (older than 30 days)
find $BACKUP_DIR -name "*.sql.gz" -mtime +30 -delete
7.2 DuckDB Backup¶
#!/bin/bash
# backup-duckdb.sh
BACKUP_DIR="/backups/duckdb"
DATE=$(date +%Y%m%d_%H%M%S)
# Copy DuckDB files
kubectl cp codegraph/codegraph-api-0:/app/data/duckdb $BACKUP_DIR/duckdb_$DATE
# Compress
tar -czvf $BACKUP_DIR/duckdb_$DATE.tar.gz -C $BACKUP_DIR duckdb_$DATE
rm -rf $BACKUP_DIR/duckdb_$DATE
7.3 Disaster Recovery¶
| RPO | RTO | Strategy |
|---|---|---|
| 1 hour | 4 hours | Hourly snapshots, standby cluster |
| 24 hours | 24 hours | Daily backups, manual recovery |
| 1 week | 72 hours | Weekly backups, cold standby |
8. Migration and Upgrades¶
8.1 Rolling Update¶
# Update image with zero downtime
kubectl set image deployment/codegraph-api \
api=codegraph:v2.0.0 \
-n codegraph
# Monitor rollout
kubectl rollout status deployment/codegraph-api -n codegraph
# Rollback on issues
kubectl rollout undo deployment/codegraph-api -n codegraph
8.2 Database Migration¶
# Run Alembic migrations
kubectl exec -n codegraph deployment/codegraph-api -- \
python -m alembic upgrade head
# Check current version
kubectl exec -n codegraph deployment/codegraph-api -- \
python -m alembic current
Related Documents¶
- Enterprise Security Brief — Security overview
- RBAC Authorization — Access control
- SIEM Integration — SIEM integration
- LLM Security — LLM interaction protection
Version: 1.2 | March 2026