Skip to content

Instruction Backtranslation

InstructionBackTranslation is a Singleton task that generates a score (1-5) and reason for a given instruction and generation

Inputs

instruction (str): The reference instruction to evaluate the text output. generation (str): The text output to evaluate for the given instruction.

Outputs

score (str): The score for the generation based on the given instruction. reason (str): The reason for the provided score. model_name (str): The model name used to score the generation.

Example

We'll use ParallelSingletonExecutor to run multiple InstructionBackTranslation task in parallel across multiple models.

from dria.client import Dria
from dria.factory import InstructionBacktranslation
from dria.models import Model
from dria.batches import ParallelSingletonExecutor
import asyncio
import json

async def batch():
    dria_client = Dria()
    singleton = InstructionBacktranslation()
    executor = ParallelSingletonExecutor(dria_client, singleton)
    executor.set_models([Model.GPT4O])
    executor.load_instructions(
        [
            {
                "instruction": "What is 3 times 20?",
                "generation": "It's 60.",
            },
            {
                "instruction": "What is 3 times 20?",
                "generation": "It's 59.",
            },
        ]
    )
    return await executor.run()


def main():
    results = asyncio.run(batch())
    print(json.dumps(results, indent=2))


if __name__ == "__main__":
    main()

Expected output

[
  {
    "reasoning": "The response is concise, accurate, and directly answers the user's question.  There's no unnecessary information or fluff. It's a perfect example of a simple, effective AI assistant response.",
    "score": "5",
    "instruction": "What is 3 times 20?",
    "generation": "It's 60.",
    "model": "gemini-1.5-flash"
  },
  {
    "reasoning": "The candidate answer is incorrect, as it fails to provide the correct answer to the math question \"What is 3 times 20?\" The correct response should be \"The answer is 60.\" Since the candidate answer gives an incorrect result and does not demonstrate any helpfulness or relevance to the user's request, it is a poor response overall.",
    "score": "1",
    "instruction": "What is 3 times 20?",
    "generation": "It's 59.",
    "model": "gpt-4o-mini"
  }
]

References