Configuration Guide¶
Configure CodeGraph for your environment.
Table of Contents¶
- Configuration Files
- API Server Configuration
- Server Settings
- Database Configuration
- Authentication Configuration
- Rate Limiting Configuration
- CORS Configuration
- Demo Endpoint Configuration
- LLM Provider Configuration
- GigaChat
- Local LLM (llama-cpp-python)
- OpenAI
- Yandex Cloud AI Studio (YandexGPT)
- Domain Configuration
- CPG Database Configuration
- Retrieval Settings
- Query Limits
- Analysis Settings
- Structural Patterns Configuration
- Environment Variables Reference
- Performance Tuning
- For Production (Multiple Workers)
- For Development (Fast Reload)
- For Limited Resources
- Security Best Practices
- Validation
- Next Steps
Configuration Files¶
| File | Purpose |
|---|---|
.env |
Environment variables (API keys, database URLs) |
config.yaml |
Main configuration (LLM, retrieval, analysis) |
src/api/config.py |
API server configuration |
config/prompts/*.yaml |
Prompt templates |
API Server Configuration¶
Server Settings¶
Configure server host, port, and workers in environment variables or by passing CLI arguments:
Via Environment Variables:
export API_HOST="0.0.0.0" # Bind to all interfaces
export API_PORT="8000" # Port number
export API_WORKERS="4" # Number of worker processes
export API_DEBUG="false" # Debug mode (auto-reload)
Via CLI Arguments:
python -m src.api.cli run --host 0.0.0.0 --port 8000 --workers 4
Database Configuration¶
The API requires PostgreSQL for user management, sessions, and audit logs.
Connection String Format:
postgresql+asyncpg://username:password@host:port/database
Configuration via Environment Variable:
export DATABASE_URL="postgresql+asyncpg://postgres:your_password@localhost:5432/codegraph"
Database Pool Settings:
Edit src/api/config.py to customize connection pool:
class DatabaseConfig(BaseModel):
url: str = "postgresql+asyncpg://postgres:postgres@localhost:5432/codegraph"
pool_size: int = 10 # Number of connections to maintain
max_overflow: int = 20 # Extra connections when pool is full
pool_timeout: int = 30 # Seconds to wait for connection
pool_recycle: int = 1800 # Recycle connections after 30 minutes
echo: bool = False # Log all SQL statements (debug only)
Authentication Configuration¶
JWT Authentication¶
Configure JWT token settings:
# Secret key for signing tokens (CHANGE IN PRODUCTION!)
export API_JWT_SECRET="your-secret-key-min-64-chars-recommended"
# Token expiration (default: 30 minutes for access, 7 days for refresh)
export API_JWT_ACCESS_EXPIRE_MINUTES="30"
export API_JWT_REFRESH_EXPIRE_DAYS="7"
In config.py:
class JWTConfig(BaseModel):
secret_key: str = "change-me-in-production-use-64-chars-minimum"
algorithm: str = "HS256"
access_token_expire_minutes: int = 30
refresh_token_expire_days: int = 7
Generate a secure secret key:
python -c "import secrets; print(secrets.token_urlsafe(64))"
API Keys¶
Enable API key authentication for programmatic access:
class AuthConfig(BaseModel):
api_keys_enabled: bool = True # Enable API key authentication
Create API keys via the API:
# Requires JWT token
curl -X POST http://localhost:8000/api/v1/auth/api-keys \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "My API Key",
"expires_days": 365,
"scopes": ["scenarios:read", "query:execute"]
}'
OAuth Providers¶
Configure OAuth 2.0 providers (GitHub, Google, GitLab, Keycloak):
# GitHub OAuth
export OAUTH_GITHUB_CLIENT_ID="your_github_client_id"
export OAUTH_GITHUB_CLIENT_SECRET="your_github_client_secret"
# Google OAuth
export OAUTH_GOOGLE_CLIENT_ID="your_google_client_id"
export OAUTH_GOOGLE_CLIENT_SECRET="your_google_client_secret"
# Keycloak
export OAUTH_KEYCLOAK_SERVER_URL="https://keycloak.example.com"
export OAUTH_KEYCLOAK_REALM="your_realm"
export OAUTH_KEYCLOAK_CLIENT_ID="your_client_id"
export OAUTH_KEYCLOAK_CLIENT_SECRET="your_client_secret"
LDAP/Active Directory¶
Configure LDAP for enterprise authentication:
export LDAP_SERVER="ldap.example.com"
export LDAP_BASE_DN="dc=example,dc=com"
export LDAP_BIND_USER="cn=admin,dc=example,dc=com"
export LDAP_BIND_PASSWORD="admin_password"
export LDAP_USER_SEARCH_BASE="ou=users,dc=example,dc=com"
export LDAP_GROUP_SEARCH_BASE="ou=groups,dc=example,dc=com"
Rate Limiting Configuration¶
Protect your API with rate limiting:
class RateLimitConfig(BaseModel):
enabled: bool = True
storage: str = "memory" # "memory" or "redis://localhost:6379"
default_limits: List[str] = ["100/minute", "1000/hour"]
endpoint_limits: Dict[str, str] = {
"/api/v1/review/*": "10/minute",
"/api/v1/chat": "60/minute",
"/api/v1/chat/stream": "30/minute",
"/api/v1/query/execute": "30/minute",
"/api/v1/demo/chat": "30/minute",
}
Use Redis for production (shared across workers):
export RATE_LIMIT_STORAGE="redis://localhost:6379"
CORS Configuration¶
Configure Cross-Origin Resource Sharing for web frontends:
class CORSConfig(BaseModel):
allowed_origins: List[str] = [
"http://localhost:8080",
"http://127.0.0.1:8080",
"http://localhost:3000",
"http://127.0.0.1:3000",
]
allowed_methods: List[str] = ["GET", "POST", "PUT", "DELETE", "OPTIONS"]
allowed_headers: List[str] = ["*"]
allow_credentials: bool = True
max_age: int = 600 # Preflight cache duration (seconds)
Demo Endpoint Configuration¶
Configure the public demo endpoint:
export DEMO_ENABLED="true"
export DEMO_RATE_LIMIT="30/minute"
class DemoConfig(BaseModel):
enabled: bool = True
rate_limit: str = "30/minute"
allowed_scenarios: List[str] = ["onboarding"]
max_query_length: int = 500
LLM Provider Configuration¶
Configure the LLM provider used for code analysis and query generation.
GigaChat¶
export GIGACHAT_AUTH_KEY="your_gigachat_key"
# config.yaml
llm:
provider: gigachat
model: GigaChat-2-Pro
temperature: 0.1
max_tokens: 4096
Local LLM (llama-cpp-python)¶
export QWEN3_MODEL_PATH="/path/to/qwen3/model.gguf"
# config.yaml
llm:
provider: local
model_path: ${QWEN3_MODEL_PATH}
n_gpu_layers: -1 # Use all GPU layers (-1)
n_ctx: 8192 # Context window size
n_threads: 8 # CPU threads for inference
temperature: 0.1
max_tokens: 4096
OpenAI¶
export OPENAI_API_KEY="your_openai_key"
# config.yaml
llm:
provider: openai
model: gpt-4
temperature: 0.1
max_tokens: 4096
api_base: https://api.openai.com/v1
Yandex Cloud AI Studio (YandexGPT)¶
export YANDEX_API_KEY="your_yandex_api_key"
export YANDEX_FOLDER_ID="your_folder_id"
# config.yaml
llm:
provider: yandex
yandex:
api_key: ${YANDEX_API_KEY}
folder_id: ${YANDEX_FOLDER_ID}
model: yandexgpt/latest # or: yandexgpt-lite/latest, yandexgpt/rc
base_url: https://llm.api.cloud.yandex.net/v1
temperature: 0.7
max_tokens: 2000
timeout: 60
embedding_model: text-search-doc/latest
Available models:
- yandexgpt/latest - YandexGPT (main model)
- yandexgpt-lite/latest - YandexGPT Lite (faster, smaller)
- yandexgpt/rc - Release Candidate with reasoning
- yandexgpt-32k/latest - Extended context (32K tokens)
Domain Configuration¶
Switch between different codebases for analysis:
# config.yaml
domain:
name: postgresql_v2 # postgresql_v2, generic_cpp, go, java, python_django, etc.
auto_activate: true
Available domains:
- postgresql_v2 — PostgreSQL 17.6 database
- generic_cpp — Generic C/C++ codebase
- go — Go codebase
- java — Java/JVM codebase
- python_django — Python Django web framework
- python_generic — Generic Python codebase
- javascript — JavaScript/TypeScript
- csharp — C#/.NET
- kotlin — Kotlin/JVM
- onec — 1C:Enterprise (BSL/SDBL)
- php — PHP
CPG Database Configuration¶
Configure the Code Property Graph database (for code analysis):
# config.yaml
cpg:
type: postgresql # Database type for CPG storage
db_path: data/projects/postgres.duckdb # Per-project DuckDB path
Retrieval Settings¶
Configure semantic search and retrieval:
# config.yaml
retrieval:
embedding_model: all-MiniLM-L6-v2 # Sentence transformer model
embedding_dimension: 384 # Vector dimension
top_k_qa: 3 # Top results for Q&A retrieval
top_k_graph: 5 # Top results for graph (SQL) retrieval
max_results: 50 # Maximum search results
chunk_size: 512 # Text chunk size for embeddings
# Hybrid retrieval (vector + graph)
hybrid:
enabled: true
vector_weight: 0.6 # Semantic search weight
graph_weight: 0.4 # Structural search weight
rrf_k: 60 # Reciprocal Rank Fusion constant
Query Limits¶
# config.yaml
query:
default_limit: 100 # Default LIMIT for SQL queries
max_limit: 1000 # Maximum allowed LIMIT
Analysis Settings¶
Configure analysis thresholds:
# config.yaml
analysis:
sanitization_confidence_threshold: 0.7 # Confidence for sanitization detection
similarity_threshold: 0.8 # Semantic similarity threshold
complexity_threshold: 10 # Cyclomatic complexity threshold
Structural Patterns Configuration¶
Configure structural pattern search, scanning, and rewriting:
# config.yaml
patterns:
enabled: true # Enable/disable patterns feature
rule_dirs: # Default rule directory paths
- ".codegraph/rules"
match_timeout: 30 # Timeout (seconds) for pattern matching
max_matches_per_rule: 10000 # Maximum matches returned per rule
max_rules: 500 # Maximum rules to load
incremental:
enabled: true # Enable incremental evaluation
max_dependent_files: 100 # Max files to track for incremental updates
rewrite:
backup: true # Backup files before applying fixes
max_fixes_per_file: 50 # Max fixes to apply per file
require_approval: true # Require user approval before applying fixes
Configuration Options¶
| Key | Type | Default | Description |
|---|---|---|---|
patterns.enabled |
boolean | true |
Enable/disable the structural patterns feature |
patterns.rule_dirs |
list | [".codegraph/rules"] |
Directories to load YAML rules from |
patterns.match_timeout |
integer | 30 |
Timeout in seconds for pattern matching per rule |
patterns.max_matches_per_rule |
integer | 10000 |
Cap on matches returned per rule |
patterns.max_rules |
integer | 500 |
Maximum number of rules to load |
patterns.incremental.enabled |
boolean | true |
Enable incremental evaluation (re-scan only changed files) |
patterns.incremental.max_dependent_files |
integer | 100 |
Maximum dependent files to track for incremental mode |
patterns.rewrite.backup |
boolean | true |
Create backups before applying fix rewrites |
patterns.rewrite.max_fixes_per_file |
integer | 50 |
Maximum fix rewrites to apply per file |
patterns.rewrite.require_approval |
boolean | true |
Require user approval before applying fixes |
Custom Rule Directories¶
Add project-specific rules by extending the rule_dirs list:
patterns:
rule_dirs:
- ".codegraph/rules" # Default rules
- "./my-project-rules" # Project-specific rules
- "/shared/team-rules" # Shared team rules
Rules in later directories take precedence over earlier ones when rule IDs conflict.
Environment Variables Reference¶
Create a .env file in the project root:
# =============================================================================
# API Server Configuration
# =============================================================================
API_HOST=0.0.0.0
API_PORT=8000
API_WORKERS=4
API_DEBUG=false
# =============================================================================
# Database Configuration
# =============================================================================
DATABASE_URL=postgresql+asyncpg://postgres:your_password@localhost:5432/codegraph
# =============================================================================
# Authentication
# =============================================================================
API_JWT_SECRET=your-secret-key-min-64-chars-change-in-production
API_JWT_ALGORITHM=HS256
# Admin user (for initial setup)
API_ADMIN_USERNAME=admin
API_ADMIN_PASSWORD=change-this-password
# =============================================================================
# LLM API Providers
# =============================================================================
GIGACHAT_AUTH_KEY=your_gigachat_key
OPENAI_API_KEY=your_openai_key
YANDEX_API_KEY=your_yandex_api_key
YANDEX_FOLDER_ID=your_yandex_folder_id
# =============================================================================
# Local Model Paths
# =============================================================================
QWEN3_MODEL_PATH=/path/to/qwen3/model.gguf
# =============================================================================
# OAuth Providers (Optional)
# =============================================================================
OAUTH_GITHUB_CLIENT_ID=your_github_client_id
OAUTH_GITHUB_CLIENT_SECRET=your_github_client_secret
OAUTH_GOOGLE_CLIENT_ID=your_google_client_id
OAUTH_GOOGLE_CLIENT_SECRET=your_google_client_secret
# =============================================================================
# LDAP Configuration (Optional)
# =============================================================================
LDAP_SERVER=ldap.example.com
LDAP_BASE_DN=dc=example,dc=com
LDAP_BIND_USER=cn=admin,dc=example,dc=com
LDAP_BIND_PASSWORD=admin_password
LDAP_USER_SEARCH_BASE=ou=users,dc=example,dc=com
LDAP_GROUP_SEARCH_BASE=ou=groups,dc=example,dc=com
# =============================================================================
# Rate Limiting
# =============================================================================
RATE_LIMIT_STORAGE=memory # or redis://localhost:6379
# =============================================================================
# Demo Endpoint
# =============================================================================
DEMO_ENABLED=true
DEMO_RATE_LIMIT=30/minute
# =============================================================================
# CPG Database Settings
# =============================================================================
CPG_DB_PATH=data/projects/postgres.duckdb
# CPG_TYPE=postgresql # Domain type for analysis
# =============================================================================
# Logging
# =============================================================================
LOG_LEVEL=INFO
LLM_VERBOSE=false
# =============================================================================
# Performance
# =============================================================================
CUDA_VISIBLE_DEVICES=0
OMP_NUM_THREADS=8
Performance Tuning¶
For Production (Multiple Workers)¶
# Use multiple workers for better throughput
python -m src.api.cli run --host 0.0.0.0 --port 8000 --workers 4
# Use Redis for rate limiting (shared across workers)
export RATE_LIMIT_STORAGE="redis://localhost:6379"
# Increase database pool size
# Edit src/api/config.py:
# pool_size: 20
# max_overflow: 40
For Development (Fast Reload)¶
# Single worker with auto-reload
python -m src.api.cli run --host 127.0.0.1 --port 8000 --reload --log-level debug
# Enable SQL logging
# Edit src/api/config.py:
# echo: True
For Limited Resources¶
# config.yaml
retrieval:
batch_size: 50 # Lower from 100
top_k_qa: 3 # Lower from 10
top_k_graph: 3
llm:
max_tokens: 2048 # Lower from 4096
Security Best Practices¶
- Change default passwords: ```bash # Generate secure JWT secret python -c “import secrets; print(secrets.token_urlsafe(64))”
# Set strong admin password
python -m src.api.cli create-admin –username admin –password
-
Use environment variables for secrets: - Never commit
.envfiles to version control - Add.envto.gitignore -
Enable HTTPS in production: - Use a reverse proxy (nginx, Caddy, Traefik) - Or configure uvicorn with SSL certificates
-
Restrict CORS origins:
python allowed_origins: List[str] = [ "https://yourapp.example.com", # Production frontend only ] -
Configure rate limiting: - Use Redis for distributed rate limiting - Adjust limits based on your capacity
Multi-Tenant Isolation¶
Enable multi-tenant project isolation to let multiple teams share one CodeGraph instance:
multi_tenant:
enabled: false # true = enforce group-level RBAC on all data endpoints
max_project_connections: 10 # Max concurrent CPGQueryService instances (LRU cache)
max_harness_instances: 5 # Max concurrent Harness instances (LRU cache)
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled |
bool | false |
Enable group-level RBAC enforcement. When false, all group checks are no-ops. |
max_project_connections |
int | 10 |
LRU cache size for project-scoped CPG connections. Evicted connections are closed. |
max_harness_instances |
int | 5 |
LRU cache size for project-scoped Harness instances. |
When enabled, every API request resolves a ProjectContext via:
1. X-Project-Id header (UUID) — explicit project selection
2. User’s active project — from user_active_project table
3. Global fallback — uses ProjectManager singleton (CLI/TUI/demo mode)
Migration for existing installations: Run python scripts/migrate_default_group.py once to create a default group with all existing users and projects.
Validation¶
Test your configuration:
# Test database connection
python -c "
from src.api.database.connection import check_db_connection
import asyncio
print('Database OK' if asyncio.run(check_db_connection()) else 'Database FAILED')
"
# Test API health
curl http://localhost:8000/api/v1/health
Next Steps¶
- Installation Guide - Set up the system
- Quick Start Guide - Get started quickly
- TUI User Guide - Learn to use the system
- API Reference - Explore API endpoints