Viewing Evaluation Results

Inputs: The input data from your test set
Reference Answers: The expected/correct answers used by evaluators
LLM Output: The actual output from your application
Evaluator Results: Scores or boolean values from each evaluator
Cost: The cost of running this test case
Latency: How long the test case took to execute

Overview

Once your evaluation completes, Agenta provides comprehensive views to analyze the results and understand your LLM application's performance.

The main view offers an aggregated summary of results.

The test cases evaluation tab provides a detailed view of each test case.

The evaluation table columns show:

If you click on a test case, you will see a drawer with the full output and the evaluator results.

The prompt configuration tab shows the prompt configuration used for this evaluation.

Export your evaluation results for further analysis: