Yandex AI Studio Integration Guide¶
Integration guide for Yandex Cloud AI Studio with CodeGraph using OpenAI-compatible API.
Table of Contents¶
- Overview
- Quick Setup (3 Steps)
- Step 1: Create API Key
- Step 2: Set Environment Variables
- Step 3: Configure config.yaml
- Available Models
- Text Generation Models
- Embedding Models
- Model URI Format
- Usage in Code
- Basic Usage
- Streaming
- Embeddings
- With CodeGraph Workflow
- Configuration Reference
- Full config.yaml Example
- Environment Variables
- Provider Parameters
- Privacy and Compliance
- Error Handling
- Authentication Error (401)
- Rate Limit Exceeded (429)
- Connection Timeout
- Folder ID Error
- Retry Logic
- Best Practices
- Comparison with Other Providers
- Resources
- Next Steps
Overview¶
Yandex Cloud AI Studio is a comprehensive AI platform providing access to: - YandexGPT family of models (YandexGPT, YandexGPT-Lite, YandexGPT-32k) - Open-source models (Qwen3, DeepSeek, OpenAI OSS) - Embedding models for semantic search
CodeGraph integrates with Yandex AI Studio using the OpenAI-compatible API, which allows using the standard openai Python library.
Key Features:
- OpenAI API compatibility (use familiar openai library)
- Privacy-focused: logging disabled by default via x-data-logging-enabled: false
- Multiple model options from 20B to 235B parameters
- Built-in retry with exponential backoff
Quick Setup (3 Steps)¶
Step 1: Create API Key¶
- Go to Yandex Cloud Console
- Navigate to AI Studio in the left menu
- Click Create new key in the top panel
- Select Create API key
- In the Scope field, select
yc.ai.languageModels.execute - Click Create and save both the ID and secret key
- Note your Folder ID from the folder selector in the top panel
Step 2: Set Environment Variables¶
# Windows PowerShell
$env:YANDEX_API_KEY = "AQVNxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
$env:YANDEX_FOLDER_ID = "b1gxxxxxxxxxxxxxxxxx"
# Permanent (survives restart)
[System.Environment]::SetEnvironmentVariable('YANDEX_API_KEY', 'AQVNxxx...', 'User')
[System.Environment]::SetEnvironmentVariable('YANDEX_FOLDER_ID', 'b1gxxx...', 'User')
# Linux/Mac
export YANDEX_API_KEY="AQVNxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export YANDEX_FOLDER_ID="b1gxxxxxxxxxxxxxxxxx"
# Add to ~/.bashrc for permanent
echo 'export YANDEX_API_KEY="your_key"' >> ~/.bashrc
echo 'export YANDEX_FOLDER_ID="your_folder"' >> ~/.bashrc
Step 3: Configure config.yaml¶
llm:
provider: yandex
yandex:
api_key: ${YANDEX_API_KEY}
folder_id: ${YANDEX_FOLDER_ID}
model: "qwen3-235b-a22b-fp8/latest"
temperature: 0.7
max_tokens: 2000
timeout: 180
Verify configuration:
python -c "
from src.llm.yandex_provider import YandexProvider
from src.llm.base_provider import LLMConfig
config = LLMConfig(provider_type='yandex')
provider = YandexProvider(config)
print(f'Provider: {provider}')
print('Connection successful!')
"
Available Models¶
Text Generation Models¶
| Model | URI | Context | Best For |
|---|---|---|---|
| Qwen3 235B | qwen3-235b-a22b-fp8/latest |
262K | Complex code analysis, security review (default) |
| gpt-oss-120b | gpt-oss-120b/latest |
131K | OpenAI OSS reasoning model, complex tasks |
| gpt-oss-20b | gpt-oss-20b/latest |
131K | OpenAI OSS model, balanced performance |
| Gemma 3 27B | gemma-3-27b-it/latest |
131K | Google’s open model, instruction-tuned |
| YandexGPT Pro 5 | yandexgpt/latest |
32K | General purpose, excellent Russian support |
| YandexGPT Pro 5.1 | yandexgpt/rc |
32K | Latest features, improved reasoning |
| YandexGPT Lite 5 | yandexgpt-lite |
32K | Fast responses, simple queries |
| Alice AI LLM | aliceai-llm |
32K | Conversational AI, dialogue systems |
| Fine-tuned YandexGPT Lite | yandexgpt-lite/latest@<suffix> |
32K | Custom fine-tuned models |
Recommendations:
- Code Analysis: Use qwen3-235b-a22b-fp8/latest (262K context, best quality)
- Security Review: Use gpt-oss-120b/latest (reasoning capabilities)
- Fast Queries: Use yandexgpt-lite for speed
- Russian Text: Use yandexgpt/latest for native Russian support
- Large Files: Use models with 131K+ context (Qwen3, gpt-oss, Gemma)
Embedding Models¶
| Model | Dimensions | Use Case |
|---|---|---|
text-search-doc/latest |
256 | Document embeddings (default) |
text-search-query/latest |
256 | Query embeddings |
Model URI Format¶
Yandex requires a specific model URI format:
gpt://<folder_id>/<model_name>
emb://<folder_id>/<embedding_model>
Examples:
gpt://b1g123456789/qwen3-235b-a22b-fp8/latest
gpt://b1g123456789/yandexgpt/latest
emb://b1g123456789/text-search-doc/latest
The CodeGraph provider constructs this automatically from your folder_id and model settings.
Usage in Code¶
Basic Usage¶
import os
from openai import OpenAI
# Initialize client with Yandex endpoint
client = OpenAI(
api_key=os.getenv("YANDEX_API_KEY"),
base_url="https://llm.api.cloud.yandex.net/v1",
default_headers={
"x-data-logging-enabled": "false", # Privacy
"x-folder-id": os.getenv("YANDEX_FOLDER_ID"),
},
)
# Construct model URI
folder_id = os.getenv("YANDEX_FOLDER_ID")
model_uri = f"gpt://{folder_id}/qwen3-235b-a22b-fp8/latest"
# Make request
response = client.chat.completions.create(
model=model_uri,
messages=[
{"role": "system", "content": "You are a code analysis expert."},
{"role": "user", "content": "Explain MVCC in PostgreSQL."},
],
temperature=0.7,
max_tokens=2000,
)
print(response.choices[0].message.content)
Streaming¶
# Stream response
stream = client.chat.completions.create(
model=model_uri,
messages=[
{"role": "user", "content": "List security best practices for C code."},
],
stream=True,
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Embeddings¶
# Get embeddings
embedding_uri = f"emb://{folder_id}/text-search-doc/latest"
response = client.embeddings.create(
model=embedding_uri,
input="Find buffer overflow vulnerabilities",
encoding_format="float",
)
embedding = response.data[0].embedding
print(f"Embedding dimension: {len(embedding)}") # 256
With CodeGraph Workflow¶
from src.llm.yandex_provider import YandexProvider
from src.llm.base_provider import LLMConfig
# Initialize provider (reads from config.yaml)
config = LLMConfig(
provider_type='yandex',
temperature=0.7,
max_tokens=2000,
extra_params={
'api_key': os.getenv('YANDEX_API_KEY'),
'folder_id': os.getenv('YANDEX_FOLDER_ID'),
'model': 'qwen3-235b-a22b-fp8/latest',
}
)
provider = YandexProvider(config)
# Generate response
response = provider.generate(
system_prompt="You are a PostgreSQL security expert.",
user_prompt="Find SQL injection vulnerabilities in the executor module."
)
print(response.content)
print(f"Tokens used: {response.metadata['tokens_used']}")
print(f"Latency: {response.metadata['latency_ms']:.0f}ms")
Using with workflow:
from src.workflow.orchestration.copilot import Copilot
# Copilot automatically uses configured provider
copilot = Copilot()
result = copilot.answer("Find memory allocation functions in PostgreSQL")
print(result['answer'])
Configuration Reference¶
Full config.yaml Example¶
llm:
# Set Yandex as the active provider
provider: yandex
yandex:
# Required: Authentication
api_key: ${YANDEX_API_KEY} # Service account API key
folder_id: ${YANDEX_FOLDER_ID} # Yandex Cloud folder ID
# Model selection
model: "qwen3-235b-a22b-fp8/latest" # Best for code analysis
# API endpoint (default, usually not changed)
base_url: "https://llm.api.cloud.yandex.net/v1"
# Timeouts
timeout: 180 # seconds (increased for large prompts)
# Generation parameters
temperature: 0.7 # Creativity (0.0-1.0)
max_tokens: 2000 # Max response length
top_p: null # Use Yandex default
# Embedding model
embedding_model: "text-search-doc/latest"
Environment Variables¶
| Variable | Required | Description |
|---|---|---|
YANDEX_API_KEY |
Yes | API key from Yandex Cloud Console |
YANDEX_FOLDER_ID |
Yes | Folder ID where AI Studio is enabled |
Provider Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
string | - | Yandex Cloud API key (required) |
folder_id |
string | - | Yandex Cloud folder ID (required) |
model |
string | qwen3-235b-a22b-fp8/latest |
Text generation model |
base_url |
string | https://llm.api.cloud.yandex.net/v1 |
API endpoint |
timeout |
int | 60 | Request timeout in seconds |
temperature |
float | 0.7 | Response randomness (0.0-1.0) |
max_tokens |
int | 2000 | Maximum response tokens |
top_p |
float | null | Nucleus sampling parameter |
embedding_model |
string | text-search-doc/latest |
Embedding model |
Privacy and Compliance¶
CodeGraph automatically disables Yandex-side logging for privacy/GDPR compliance:
default_headers={
"x-data-logging-enabled": "false", # Disables logging on Yandex side
"x-folder-id": folder_id,
}
This ensures: - Your prompts are not logged by Yandex - Response data is not stored - Compliant with data protection regulations
Error Handling¶
Authentication Error (401)¶
Error: 401 Unauthorized
Solutions:
# Check if variables are set
echo $YANDEX_API_KEY
echo $YANDEX_FOLDER_ID
# Verify API key format (should start with AQVNw, AQVN_, or similar)
python -c "import os; key = os.getenv('YANDEX_API_KEY', ''); print(f'Key prefix: {key[:5]}...' if key else 'NOT SET')"
# Verify API key scope includes yc.ai.languageModels.execute
Rate Limit Exceeded (429)¶
Error: 429 Too Many Requests
Solutions:
# Built-in retry handles this automatically
# But you can increase timeout for complex requests
yandex:
timeout: 300 # 5 minutes
Or implement custom retry:
import time
for attempt in range(3):
try:
response = provider.generate(system_prompt, user_prompt)
break
except YandexRateLimitError:
time.sleep(2 ** attempt) # Exponential backoff
Connection Timeout¶
Error: Request timeout
Solutions:
# Increase timeout for large prompts/responses
yandex:
timeout: 300 # seconds
Folder ID Error¶
Error: Folder not found or access denied
Solutions: 1. Verify folder ID in Yandex Cloud Console (top panel dropdown) 2. Ensure AI Studio is enabled in the folder 3. Check that API key has access to the folder
Retry Logic¶
The Yandex provider includes built-in retry with exponential backoff:
# Automatic retry for:
# - APITimeoutError (timeout)
# - APIConnectionError (connection issues)
# Retry parameters:
# - max_retries: 3
# - initial_delay: 2.0 seconds
# - max_delay: 30.0 seconds
# - backoff_factor: 2.0
Example retry sequence: 1. First attempt fails -> wait 2s 2. Second attempt fails -> wait 4s 3. Third attempt fails -> raise exception
Best Practices¶
- Use Qwen3-235B for code analysis - Best quality for complex queries
- Use YandexGPT-Lite for simple queries - Faster response times
- Set appropriate timeout - 180s for security analysis, 60s for simple queries
- Monitor token usage - Response metadata includes token counts
- Use streaming for long responses - Better UX for interactive applications
- Store credentials in environment - Never commit API keys to git
Comparison with Other Providers¶
| Feature | Yandex AI Studio | GigaChat | OpenAI |
|---|---|---|---|
| API Compatibility | OpenAI-compatible | Custom SDK | Native |
| Best Model | Qwen3 235B | GigaChat-2-Pro | GPT-4 |
| Privacy Header | Yes (x-data-logging-enabled) | No | No |
| Russian Support | Excellent | Excellent | Good |
| Pricing | Pay-per-token | Pay-per-token | Pay-per-token |
| Max Context | 262K (Qwen3) | 32K | 128K (GPT-4) |
| Open Models | Qwen3, Gemma, gpt-oss | No | No |
Resources¶
- Yandex AI Studio Overview
- OpenAI Compatibility Documentation
- Available Models
- Yandex Cloud ML SDK
- VS Code Integration Tutorial
Next Steps¶
- Installation - Full setup guide
- Configuration - All settings
- TUI User Guide - Using the system
- GigaChat Integration - Alternative LLM provider