Dria Batch Inference API
Overview
Dria’s Batch Inference API is an open-source, crowdsourced batch AI inference API which is optimized for massive AI workloads. It’s ideal for:
- Processing large amounts of data
- Optimizing inference costs
- Large-scale offline evaluations
- Experimenting with offline workflows
Key benefits:
- Offers the cheapest inference for big jobs
- Has high throughput
- Offers asynchronous processing
Getting Started
1. Prepare Your Batch File
Create a .jsonl
file where each line is a JSON object representing a single request. Each request must include a unique custom_id
(UUID recommended), a type
, a version
, and a body
with the model and input parameters.
Example:
- custom_id: Unique identifier for each request (UUID required)
- type: Task type (e.g., “completions”)
- version: API version (e.g., “v1”)
- body: Model and input parameters (see supported models below)
2. Upload Your Batch File (Python Example)
Dria utilizes a two-step upload process:
- Acquire a pre-signed URL for the upload.
- Upload your file to the provided URL.
- Complete the upload to begin processing.
3. Monitor and Retrieve Results
After the upload has been completed, your batch job will be processed asynchronously.Immediately following this you may:
- Use the web interface to upload, check of, and download any subsequent results.
Example Output File
Each line in the output .jsonl
file corresponds to a request, including the model, result, and token usage.
Example:
- Use
custom_id
to match input and output lines. - Failed requests will include error information.
Limits and Plans
- Tasks per file: Up to 100,000
- Free plan: 3 files per day
- Developer plan: 10 files per day
- Enterprise: Custom Limits ( Reach out to “inference@dria.co“ )
Supported Models
Dria Batch API supports a wide range of models.
- Claude 3.7 Sonnet:
'claude-3.7-sonnet'
- Claude 3.5 Sonnet:
'claude-3.5-sonnet'
- Gemini 2.5 Pro Experimental:
'gemini-2.5-pro-exp'
- Gemini 2.0 Flash:
'gemini-2.0-flash'
- gemma3 4b:
'gemma3:4b'
- gemma3 12b:
'gemma3:12b'
- gemma3 27b:
'gemma3:27b'
- GPT-4o-mini:
'gpt-4o-mini'
- GPT-4o:
'gpt-4o'
- Llama 3.3 70B Instruct:
'llama3.3:70b-instruct-q4_K_M'
- Llama 3.1 8B Instruct:
'llama3.1:8b-instruct-q4_K_M'
- Llama 3.2 1B Instruct:
'llama3.2:1b-instruct-q4_K_M'
- Mistral Nemo 12B:
'mixtral-nemo:12b'
For the most up-to-date list and details, always refer to the Dria Batch Inference page.
Web Interface
You can use the Dria Batch Inference Web Interface to:
- Upload
.jsonl
files - Check batch status
- Download results
- Obtain your API key
Example Error Handling (Python)
FAQ
- How do I check the status of my batch job?
- Use the web interface.
- How are results delivered?
- Downloadable
.jsonl
file after processing.
- Downloadable
- What if my upload fails?
- Check your API key, file size, and network connection. Review error messages for details.
For more details or support, visit the Dria Batch Inference page.