Demo API

Public, unauthenticated endpoint for the landing page “Try it yourself” feature. Rate-limited per IP, restricted to the onboarding scenario only.

Endpoints¶

POST /api/v1/demo/chat¶

Send a natural-language query about the demo codebase (PostgreSQL 17.6).

Authentication: None (public endpoint) Rate limit: 30 requests/minute per IP

Request¶

{
  "query": "Where is the main function?",
  "language": "en"
}

Field	Type	Required	Default	Description
`query`	string	Yes	—	Natural-language question (1–500 chars)
`language`	string	No	`"ru"`	Response language (`"en"` or `"ru"`)

Response (200)¶

Successful query:

{
  "answer": "The main function is defined in src/backend/main/main.c at line 53...",
  "scenario_id": "onboarding",
  "processing_time_ms": 234.5
}

Rejected query (off-topic, wrong scenario, or blocked content):

{
  "answer": "### Request Blocked\n\nThis type of request is not supported in the demo version...",
  "scenario_id": "demo_rejection",
  "processing_time_ms": 12.3
}

Internal error (LLM unavailable, etc.):

{
  "answer": "Sorry, the analysis system is temporarily unavailable. Please try again later.",
  "scenario_id": "error",
  "processing_time_ms": 1502.7
}

Field	Type	Description
`answer`	string	LLM-generated response, rejection message, or error text
`scenario_id`	string	`"onboarding"` — success, `"demo_rejection"` — query rejected, `"error"` — internal error
`processing_time_ms`	float	Server-side processing time

Error Responses¶

Status	Cause	Body
400	Query too long (>500 chars)	`{"detail": "Query too long. Maximum length is 500 characters."}`
422	Pydantic validation failure (empty query, wrong types)	`{"detail": [...]}`
429	Rate limit exceeded	`{"detail": "Too many requests"}`
503	Demo mode disabled	`{"detail": "Demo endpoint is currently disabled"}`

Note: Off-topic and wrong-scenario queries return HTTP 200 with scenario_id: "demo_rejection" and a friendly message in the answer field. They are NOT HTTP errors — this allows the landing page to display helpful guidance without triggering error handlers.

GET /api/v1/demo/status¶

Check demo endpoint availability and configuration.

Authentication: None

Response (200)¶

{
  "enabled": true,
  "rate_limit": "30/minute",
  "max_query_length": 500,
  "allowed_scenarios": ["onboarding"]
}

Query Validation Pipeline¶

Incoming queries pass through a 3-stage validation pipeline before processing:

1. HARD REJECT  — explicit malicious content (regex patterns from domain plugin)
   └─ Returns 200 with scenario_id="demo_rejection", rejection_reason="blocked_content"

2. WRONG SCENARIO — legitimate but outside onboarding scope
   └─ Returns 200 with scenario_id="demo_rejection", rejection_reason="wrong_scenario"

3. DOMAIN RELEVANCE — keyword/pattern scoring against domain plugin
   └─ Score ≥ 0.5 (LOW threshold) → accepted, forwarded to onboarding handler
   └─ Score < 0.5 → rejected with scenario_id="demo_rejection", rejection_reason="off_topic"

ValidationResult¶

The internal ValidationResult dataclass (demo.py:122):

@dataclass
class ValidationResult:
    is_valid: bool
    confidence: float
    rejection_reason: Optional[str] = None  # "off_topic" | "wrong_scenario" | "blocked_content"
    detected_scenario: Optional[str] = None
    method: str = "keyword"

Relevance Thresholds¶

Configured in src/config/demo.yaml → relevance.thresholds:

Threshold	Value	Trigger
`high`	0.9	3+ keyword matches
`medium`	0.75	2 keyword matches
`low`	0.5	1 keyword match — rejection boundary
`minimal`	0.1	0 keyword matches

Queries with score below low (0.5) are rejected as off-topic.

Domain Plugin Methods¶

Validation patterns are loaded from the active domain plugin (DomainPluginV3):

Method	Returns	Purpose
`get_demo_keywords()`	`List[str]`	Domain-specific keywords for relevance scoring
`get_hard_reject_patterns()`	`List[str]`	Regex patterns for hard rejection
`get_wrong_scenario_patterns()`	`List[Tuple[str, str]]`	`(pattern, scenario_name)` pairs for wrong-scenario detection

Caching¶

The demo endpoint caches responses to reduce LLM calls:

Cache	Size	TTL	Key
Response cache	100 entries (LRU)	30 minutes	`query.lower().strip()`

Note: The response cache is checked before validation. Repeated queries (even off-topic ones that were previously processed) return cached results without re-validation.

Configuration¶

config.yaml¶

api:
  demo:
    enabled: true                    # Enable/disable demo endpoint
    rate_limit: 30/minute            # Rate limit per IP
    max_query_length: 500            # Max query length in characters
    allowed_scenarios:               # Allowed scenario IDs
      - onboarding

src/config/demo.yaml¶

Separate configuration for caching and relevance scoring:

cache:
  dynamic_response:
    maxsize: 100           # LRU cache size for responses
    ttl_seconds: 1800      # 30 minutes
  judge:
    maxsize: 1000          # LRU cache size (reserved for future LLM judge)
    ttl_seconds: 3600      # 1 hour

llm_judge:
  temperature: 0.1         # Reserved for future LLM judge implementation
  max_tokens: 50
  model: yandexgpt-lite

relevance:
  thresholds:
    high: 0.9
    medium: 0.75
    low: 0.5               # Rejection boundary
    minimal: 0.1

Environment Variables¶

Variable	Default	Description
`DEMO_ENABLED`	`true`	Enable/disable demo endpoint
`DEMO_RATE_LIMIT`	`30/minute`	Rate limit per IP

Pydantic Models¶

class DemoRequest(BaseModel):
    query: str = Field(..., min_length=1, max_length=500, description="User query")
    language: str = Field(default="ru", description="Response language")

class DemoResponse(BaseModel):
    answer: str = Field(..., description="Response from the system")
    scenario_id: str = Field(default="onboarding", description="Scenario used")
    processing_time_ms: float = Field(..., description="Processing time in milliseconds")

Usage Examples¶

curl¶

# Valid query
curl -X POST http://localhost:8000/api/v1/demo/chat \
  -H "Content-Type: application/json" \
  -d '{"query": "How does MVCC work?", "language": "ru"}'

# Check status
curl http://localhost:8000/api/v1/demo/status

JavaScript (landing page)¶

const response = await fetch('/api/v1/demo/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: document.getElementById('demo-input').value,
    language: 'ru'
  })
});

const data = await response.json();

if (response.ok) {
  if (data.scenario_id === 'demo_rejection') {
    // Query rejected — show friendly guidance
    showRejection(data.answer);
  } else if (data.scenario_id === 'error') {
    showError(data.answer);
  } else {
    showAnswer(data.answer);
  }
} else if (response.status === 429) {
  showRateLimit();
}

Security Considerations¶

No authentication — endpoint is publicly accessible
Rate limiting — 30 requests/minute per IP prevents abuse
Scenario restriction — only onboarding scenario allowed (no security analysis, file editing, etc.)
Query validation — 3-stage pipeline blocks malicious and off-topic queries
Read-only — no write operations possible through demo endpoint
Domain keywords — loaded from domain plugin, not hardcoded

REST API — full API reference
Scenarios — S01 Onboarding scenario details

Module: src/api/routers/demo.py Config: src/config/demo.yaml, config.yaml → api.demo Last updated: March 2026

Endpoints¶

POST /api/v1/demo/chat¶

Request¶

Response (200)¶

Error Responses¶

GET /api/v1/demo/status¶

Response (200)¶

Query Validation Pipeline¶

ValidationResult¶

Relevance Thresholds¶

Domain Plugin Methods¶

Caching¶

Configuration¶

config.yaml¶

src/config/demo.yaml¶

Environment Variables¶

Pydantic Models¶

Usage Examples¶

curl¶

JavaScript (landing page)¶

Security Considerations¶

Related Documentation¶