- Depends on budget and model. With
budget=30andgpt-4o, expect 5-15 minutes.
- Dataset size must equal
minibatchSize + paretoSize. Minimum recommended is 4 items (2 feedback + 2 pareto).
- Executions run until completion or budget exhaustion. Monitor via
get_single_execution.
- Common issues:
- Dataset size doesn’t match
minibatchSize + paretoSize - Using models that are not supported
- Missing
modelfield in evaluator JSON - Invalid evaluator JSON format
- Missing
inputorexpectedOutputin dataset items expectedOutputis empty string
- Dataset size doesn’t match
gpt-5-mini- Latest GPT-5 Minigpt-5- Latest GPT-5gpt-4.1- GPT-4.1gpt-4o- GPT-4ogpt-4o-mini- GPT-4o Mini
- Currently only
RPM(Reflective Prompt Mutation) is available. RPM works well for most use cases including pedagogical prompts, support responses, and code review templates.
- Yes, the evaluator JSON must include a
modelfield (e.g.,"model": "gpt-4o-mini"). Only themodelkey is used; any additional evaluator keys are ignored.