Documentation Index
Fetch the complete documentation index at: https://docs.dria.co/llms.txt
Use this file to discover all available pages before exploring further.
Chat Completions
POST /v1/chat/completions
Generate text from a conversation. Supports streaming, vision, and structured output.
Request
curl https://inference.dria.co/v1/chat/completions \
-H "Authorization: Bearer dkn_live_..." \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5:9b",
"messages": [
{"role": "user", "content": "explain quantum computing in one sentence"}
]
}'
Parameters
| Field | Type | Required | Description |
|---|
model | string | yes | Model ID (e.g., qwen3.5:9b) |
messages | array | yes | Conversation messages |
max_tokens | integer | no | Max tokens to generate (default: 2048) |
temperature | float | no | Sampling temperature (default: 0.7) |
stream | boolean | no | Enable SSE streaming (default: false) |
timeout_secs | integer | no | Timeout in seconds (default: 120) |
response_format | object | no | Structured output schema |
Each message has a role and content:
{"role": "system", "content": "You are a helpful assistant"}
{"role": "user", "content": "Hello"}
{"role": "assistant", "content": "Hi! How can I help?"}
{"role": "user", "content": "What is Rust?"}
Vision (multimodal)
For vision models, content can be an array of parts:
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
]
}
Structured output
Use response_format to get JSON conforming to a schema:
{
"model": "qwen3.5:9b",
"messages": [{"role": "user", "content": "John Doe, john@example.com, 30"}],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "extract",
"schema": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "email", "age"]
}
}
}
}
Response
{
"id": "gen-abc123",
"model": "qwen3.5:9b",
"choices": [
{
"message": {
"content": "Quantum computing uses quantum bits..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 42,
"total_tokens": 57
},
"metadata": {
"node_id": "node-xyz"
}
}
Streaming
Set "stream": true to receive Server-Sent Events:
curl https://inference.dria.co/v1/chat/completions \
-H "Authorization: Bearer dkn_live_..." \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5:9b",
"messages": [{"role": "user", "content": "hello"}],
"stream": true
}'
Each event is a data: line with a JSON chunk:
data: {"choices":[{"delta":{"content":"Hello"}}]}
data: {"choices":[{"delta":{"content":"!"}}]}
data: {"choices":[{"delta":{"content":" How"}}]}
data: [DONE]
The stream ends with data: [DONE].