Public, unauthenticated endpoint for the landing page “Try it yourself” feature. Rate-limited per IP, restricted to the onboarding scenario only.
Endpoints¶
POST /api/v1/demo/chat¶
Send a natural-language query about the demo codebase (PostgreSQL 17.6).
Authentication: None (public endpoint) Rate limit: 30 requests/minute per IP
Request¶
{
"query": "Where is the main function?",
"language": "en"
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
query |
string | Yes | — | Natural-language question (1–500 chars) |
language |
string | No | "ru" |
Response language ("en" or "ru") |
Response (200)¶
Successful query:
{
"answer": "The main function is defined in src/backend/main/main.c at line 53...",
"scenario_id": "onboarding",
"processing_time_ms": 234.5
}
Rejected query (off-topic, wrong scenario, or blocked content):
{
"answer": "### Request Blocked\n\nThis type of request is not supported in the demo version...",
"scenario_id": "demo_rejection",
"processing_time_ms": 12.3
}
Internal error (LLM unavailable, etc.):
{
"answer": "Sorry, the analysis system is temporarily unavailable. Please try again later.",
"scenario_id": "error",
"processing_time_ms": 1502.7
}
| Field | Type | Description |
|---|---|---|
answer |
string | LLM-generated response, rejection message, or error text |
scenario_id |
string | "onboarding" — success, "demo_rejection" — query rejected, "error" — internal error |
processing_time_ms |
float | Server-side processing time |
Error Responses¶
| Status | Cause | Body |
|---|---|---|
| 400 | Query too long (>500 chars) | {"detail": "Query too long. Maximum length is 500 characters."} |
| 422 | Pydantic validation failure (empty query, wrong types) | {"detail": [...]} |
| 429 | Rate limit exceeded | {"detail": "Too many requests"} |
| 503 | Demo mode disabled | {"detail": "Demo endpoint is currently disabled"} |
Note: Off-topic and wrong-scenario queries return HTTP 200 with
scenario_id: "demo_rejection"and a friendly message in theanswerfield. They are NOT HTTP errors — this allows the landing page to display helpful guidance without triggering error handlers.
GET /api/v1/demo/status¶
Check demo endpoint availability and configuration.
Authentication: None
Response (200)¶
{
"enabled": true,
"rate_limit": "30/minute",
"max_query_length": 500,
"allowed_scenarios": ["onboarding"]
}
Query Validation Pipeline¶
Incoming queries pass through a 3-stage validation pipeline before processing:
1. HARD REJECT — explicit malicious content (regex patterns from domain plugin)
└─ Returns 200 with scenario_id="demo_rejection", rejection_reason="blocked_content"
2. WRONG SCENARIO — legitimate but outside onboarding scope
└─ Returns 200 with scenario_id="demo_rejection", rejection_reason="wrong_scenario"
3. DOMAIN RELEVANCE — keyword/pattern scoring against domain plugin
└─ Score ≥ 0.5 (LOW threshold) → accepted, forwarded to onboarding handler
└─ Score < 0.5 → rejected with scenario_id="demo_rejection", rejection_reason="off_topic"
ValidationResult¶
The internal ValidationResult dataclass (demo.py:122):
@dataclass
class ValidationResult:
is_valid: bool
confidence: float
rejection_reason: Optional[str] = None # "off_topic" | "wrong_scenario" | "blocked_content"
detected_scenario: Optional[str] = None
method: str = "keyword"
Relevance Thresholds¶
Configured in src/config/demo.yaml → relevance.thresholds:
| Threshold | Value | Trigger |
|---|---|---|
high |
0.9 | 3+ keyword matches |
medium |
0.75 | 2 keyword matches |
low |
0.5 | 1 keyword match — rejection boundary |
minimal |
0.1 | 0 keyword matches |
Queries with score below low (0.5) are rejected as off-topic.
Domain Plugin Methods¶
Validation patterns are loaded from the active domain plugin (DomainPluginV3):
| Method | Returns | Purpose |
|---|---|---|
get_demo_keywords() |
List[str] |
Domain-specific keywords for relevance scoring |
get_hard_reject_patterns() |
List[str] |
Regex patterns for hard rejection |
get_wrong_scenario_patterns() |
List[Tuple[str, str]] |
(pattern, scenario_name) pairs for wrong-scenario detection |
Caching¶
The demo endpoint caches responses to reduce LLM calls:
| Cache | Size | TTL | Key |
|---|---|---|---|
| Response cache | 100 entries (LRU) | 30 minutes | query.lower().strip() |
Note: The response cache is checked before validation. Repeated queries (even off-topic ones that were previously processed) return cached results without re-validation.
Configuration¶
config.yaml¶
api:
demo:
enabled: true # Enable/disable demo endpoint
rate_limit: 30/minute # Rate limit per IP
max_query_length: 500 # Max query length in characters
allowed_scenarios: # Allowed scenario IDs
- onboarding
src/config/demo.yaml¶
Separate configuration for caching and relevance scoring:
cache:
dynamic_response:
maxsize: 100 # LRU cache size for responses
ttl_seconds: 1800 # 30 minutes
judge:
maxsize: 1000 # LRU cache size (reserved for future LLM judge)
ttl_seconds: 3600 # 1 hour
llm_judge:
temperature: 0.1 # Reserved for future LLM judge implementation
max_tokens: 50
model: yandexgpt-lite
relevance:
thresholds:
high: 0.9
medium: 0.75
low: 0.5 # Rejection boundary
minimal: 0.1
Environment Variables¶
| Variable | Default | Description |
|---|---|---|
DEMO_ENABLED |
true |
Enable/disable demo endpoint |
DEMO_RATE_LIMIT |
30/minute |
Rate limit per IP |
Pydantic Models¶
class DemoRequest(BaseModel):
query: str = Field(..., min_length=1, max_length=500, description="User query")
language: str = Field(default="ru", description="Response language")
class DemoResponse(BaseModel):
answer: str = Field(..., description="Response from the system")
scenario_id: str = Field(default="onboarding", description="Scenario used")
processing_time_ms: float = Field(..., description="Processing time in milliseconds")
Usage Examples¶
curl¶
# Valid query
curl -X POST http://localhost:8000/api/v1/demo/chat \
-H "Content-Type: application/json" \
-d '{"query": "How does MVCC work?", "language": "ru"}'
# Check status
curl http://localhost:8000/api/v1/demo/status
JavaScript (landing page)¶
const response = await fetch('/api/v1/demo/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
query: document.getElementById('demo-input').value,
language: 'ru'
})
});
const data = await response.json();
if (response.ok) {
if (data.scenario_id === 'demo_rejection') {
// Query rejected — show friendly guidance
showRejection(data.answer);
} else if (data.scenario_id === 'error') {
showError(data.answer);
} else {
showAnswer(data.answer);
}
} else if (response.status === 429) {
showRateLimit();
}
Security Considerations¶
- No authentication — endpoint is publicly accessible
- Rate limiting — 30 requests/minute per IP prevents abuse
- Scenario restriction — only
onboardingscenario allowed (no security analysis, file editing, etc.) - Query validation — 3-stage pipeline blocks malicious and off-topic queries
- Read-only — no write operations possible through demo endpoint
- Domain keywords — loaded from domain plugin, not hardcoded
Related Documentation¶
Module: src/api/routers/demo.py
Config: src/config/demo.yaml, config.yaml → api.demo
Last updated: March 2026