Skip to content

EvaluatePrediction

Overview

EvaluatePrediction is a singleton class that evaluates the quality and correctness of a predicted answer in relation to a given question and context. It provides detailed evaluation feedback rather than just a boolean result.

Inputs

Field Type Description
prediction str The predicted answer to be evaluated
question str The original question
context str The context against which to evaluate the prediction

Outputs

Field Type Description
question str The original question
prediction str The predicted answer being evaluated
evaluation str Detailed evaluation feedback
model str The AI model used for evaluation

Usage

EvaluatePrediction instance can be used in data generation as follows:

from dria.factory import EvaluatePrediction

my_dataset = DriaDataset(
    name="EvaluatePrediction",
    description="A dataset for prediction evaluation",
    schema=EvaluatePrediction.OutputSchema,
)
generator = DatasetGenerator(dataset=my_dataset)

Expected output

{
  "question": "Was Pope helpful in defense of Constantinople?",
  "prediction": "Based on the information provided, it appears that Pope Nicholas V's efforts were unlikely to be significantly helpful in defending Constantinople. The fact that many Western rulers were wary of increasing papal control and had financial constraints due to their own internal conflicts and wars suggests that they would not have been able or willing to contribute substantially to a defense effort",
  "evaluation": "[correct]",
  "model": "anthropic/claude-3-5-haiku-20241022:beta"
}