Skip to content

EvaluatePrediction

EvaluatePrediction is a Singleton task that evaluates if a predicted answer is contextually and semantically correct compared to the correct answer.

Inputs

  • prediction (str): The predicted answer to be evaluated.
  • question (str): The question to compare against.
  • context (str): The context to compare against.

Outputs

  • evaluation (str): The evaluation result.
  • model (str): The model used for evaluation.

Example

Evaluate a prediction based on a question and context. This example uses the default model.

import os
import asyncio
from dria.factory import EvaluatePrediction
from dria.client import Dria
from dria.models import Task, Model

dria = Dria(rpc_token=os.environ["DRIA_RPC_TOKEN"])

async def evaluate():
    evaluator = EvaluatePrediction()
    res = await dria.execute(
        Task(
            workflow=evaluator.workflow(
                prediction="The capital of France is Paris.",
                question="What is the capital of France?",
                context="France is a country in Western Europe. Its capital city is Paris."
            ).model_dump(),
            models=[Model.GPT4O]
        ),
        timeout=45,
    )
    return evaluator.parse_result(res)

def main():
    result = asyncio.run(evaluate())
    print(result)

if __name__ == "__main__":
    main()

Expected output

{
   "evaluation":"[correct]",
   "model":"gpt-4o"
}