Installation Guide

Installation Guide

Complete installation instructions for CodeGraph.

Table of Contents

System Requirements

Hardware

  • CPU: 8+ cores recommended
  • RAM: 16GB minimum (32GB+ for large codebases with local LLM)
  • GPU: NVIDIA RTX 3090 or better (optional, for local LLM)
  • Storage: 50GB free space

Software

  • Windows 10/11 or Linux
  • Python 3.10+ (3.11 recommended)
  • PostgreSQL 15+ (required for API server)
  • Git
  • CUDA Toolkit 11.8+ (optional, for GPU-accelerated local LLM)

Step 1: Environment Setup

# Clone repository
git clone <repository-url>
cd codegraph

# Create conda environment (recommended)
conda create -n codegraph python=3.11
conda activate codegraph

# OR create venv
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Step 2: PostgreSQL Database Setup

Install PostgreSQL

Windows:

# Download and install PostgreSQL 17 from:
# https://www.postgresql.org/download/windows/

# Or use Chocolatey:
choco install postgresql

Linux:

# Ubuntu/Debian
sudo apt update
sudo apt install postgresql postgresql-contrib

# Fedora/RHEL
sudo dnf install postgresql-server postgresql-contrib
sudo postgresql-setup --initdb
sudo systemctl enable postgresql
sudo systemctl start postgresql

Verify PostgreSQL Installation

# Check if PostgreSQL is running
# Windows (PowerShell):
Get-Service postgresql*

# Linux:
sudo systemctl status postgresql

Configure Database Password

Set a password for the postgres user:

# Linux:
sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'your_password';"

# Windows:
# Open psql as postgres user and run:
# ALTER USER postgres PASSWORD 'your_password';

Important: Remember this password - you’ll need it for the DATABASE_URL.

Step 3: Initialize Database

Set the database connection string as an environment variable:

# Replace 'your_password' with your actual postgres password
export DATABASE_URL="postgresql+asyncpg://postgres:your_password@localhost:5432/codegraph"

# Windows PowerShell:
$env:DATABASE_URL="postgresql+asyncpg://postgres:your_password@localhost:5432/codegraph"

Initialize the database using the project CLI:

# Create database and run migrations
python -m src.api.cli init-db

This command will: 1. Create the codegraph database 2. Run Alembic migrations to create all tables 3. Initialize the database schema

Verify Database

# Check tables were created
# Windows:
"C:\Program Files\PostgreSQL\17\bin\psql.exe" -U postgres -d codegraph -c "\dt"

# Linux:
psql -U postgres -d codegraph -c "\dt"

Expected tables: - users - api_keys - sessions - dialogue_turns - audit_log - background_jobs - token_blacklist

Step 4: Create Admin User

# Create admin user with username and password
python -m src.api.cli create-admin --username admin --password <your_admin_password>

# Optionally add email
python -m src.api.cli create-admin --username admin --password <password> --email admin@example.com

Save your admin credentials - you’ll need them to access the API.

Step 5: LLM Provider Setup (Optional)

The API server can run without an LLM provider configured (for testing). Configure an LLM provider for full functionality:

# Set environment variable
export GIGACHAT_AUTH_KEY="your_auth_key"

# Update config.yaml
# llm:
#   provider: gigachat

See GigaChat Integration for details.

Option B: Local LLM (llama-cpp-python)

# Install llama-cpp-python with CUDA support
CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python

# Download model (Qwen3-Coder-30B recommended)
# Place in: ~/.lmstudio/models/

# Update config.yaml
# llm:
#   provider: local
#   model_path: path/to/model.gguf

Option C: OpenAI API

export OPENAI_API_KEY="your_api_key"

# Update config.yaml
# llm:
#   provider: openai
#   model: gpt-4

Step 6: Start API Server

# Set database URL (if not already set)
export DATABASE_URL="postgresql+asyncpg://postgres:your_password@localhost:5432/codegraph"

# Start the server
python -m src.api.cli run --host 127.0.0.1 --port 8000

# For development with auto-reload:
python -m src.api.cli run --host 127.0.0.1 --port 8000 --reload

The server will start on http://127.0.0.1:8000

Step 7: Verify Installation

Access API Documentation

Open your browser and visit: - Swagger UI: http://127.0.0.1:8000/api/docs - ReDoc: http://127.0.0.1:8000/api/redoc

Test Health Endpoint

curl http://127.0.0.1:8000/api/v1/health

Expected response:

{
  "status": "healthy",
  "version": "1.0.0",
  "components": {
    "database": {
      "status": "healthy",
      "database": "postgresql",
      "version": "PostgreSQL 17.x ..."
    }
  }
}

Test Authentication

# Get access token
curl -X POST http://127.0.0.1:8000/api/v1/auth/token \
  -H "Content-Type: application/json" \
  -d '{"username":"admin","password":"your_admin_password"}'

Expected response:

{
  "access_token": "eyJ...",
  "refresh_token": "eyJ...",
  "token_type": "bearer",
  "expires_in": 1800
}

Test Authenticated Endpoint

# Replace TOKEN with your access_token from above
curl http://127.0.0.1:8000/api/v1/scenarios \
  -H "Authorization: Bearer TOKEN"

Step 8: CPG Data Setup

CodeGraph uses pre-exported CPG data stored in DuckDB. If you have CPG data:

# Place your CPG database in the project directory
cp /path/to/cpg.duckdb ./cpg.duckdb

# Update config.yaml with the path
# cpg:
#   db_path: cpg.duckdb

For creating new CPG exports from source code, see CPG Export Guide.

Note: Joern is no longer required for normal operation. CPG data is pre-exported to DuckDB format.

Troubleshooting

PostgreSQL Connection Failed

Error: connection to server at "localhost" failed

Solution:

# Check PostgreSQL is running
# Windows:
Get-Service postgresql*

# Linux:
sudo systemctl status postgresql

# Start if not running
sudo systemctl start postgresql

Password Authentication Failed

Error: password authentication failed for user "postgres"

Solution:

# Reset postgres password
# Linux:
sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'new_password';"

# Update DATABASE_URL with new password
export DATABASE_URL="postgresql+asyncpg://postgres:new_password@localhost:5432/codegraph"

Database Does Not Exist

Error: database "codegraph" does not exist

Solution:

# Create database manually
psql -U postgres -c "CREATE DATABASE codegraph ENCODING 'UTF8';"

# Then run init-db again
python -m src.api.cli init-db

Port 8000 Already in Use

Error: error while attempting to bind on address ('127.0.0.1', 8000)

Solution:

# Find process using port 8000
# Windows:
netstat -ano | findstr :8000

# Kill the process (replace PID with actual process ID)
taskkill /F /PID <PID>

# Linux:
lsof -ti:8000 | xargs kill -9

# Or use a different port
python -m src.api.cli run --host 127.0.0.1 --port 8001

CUDA Not Found (for local LLM)

Error: CUDA not available

Solution:

# Check CUDA installation
nvcc --version
nvidia-smi

# Reinstall PyTorch with CUDA
pip uninstall torch
pip install torch --index-url https://download.pytorch.org/whl/cu118

Out of Memory

For systems with limited RAM:

# Reduce settings in config.yaml
retrieval:
  batch_size: 50  # Lower from 100
  top_k: 5        # Lower from 10

Next Steps