dria batch
Run parallel inference on a JSONL file. Dria automatically distributes work across available models and handles retries with exponential backoff.Basic usage
Input format
Each line is a JSON object with aprompt field. Optional id and attachment fields:
Output format
Results are written as JSONL. Each line contains the model used, output text, and token count:error field instead of output.
Auto model selection
When you don’t specify-m, Dria:
- Fetches all available models and their node counts
- Classifies each prompt by content type (text, vision, audio) based on the attachment
- Distributes prompts across models proportionally to node availability
- If a model goes down (503), automatically falls back to the next best model
Structured output in batch
Apply structured output to all prompts:Options
| Option | Description | Default |
|---|---|---|
-m, --model <model> | Model to use (auto-selects if omitted) | auto |
-o, --output <file> | Output JSONL file | stdout |
-c, --concurrency <n> | Max parallel requests | 10 |
--schema <fields> | Structured output fields | — |
--schema-file <path> | JSON schema file | — |
--retries <n> | Max retries per failed item | 3 |
--max-tokens <n> | Max tokens per request | 2048 |
--temperature <t> | Sampling temperature | 0.7 |
--json | Suppress spinners (for piping) | false |