No-code Evaluation

This guide will show you how to run evaluations from the UI.

Before you get started, make sure that you have created a test set and configured evaluators appropriate for your task.

Running Evaluations

To start an evaluation, navigate to the Evaluations page and click the Start new evaluation button. A modal will appear, allowing you to setup the evaluation.

Setting Up Evaluation Parameters

In the modal, specify the following:

Testset: Choose the testset(s) for your evaluation.
Variants: Choose one or more variants to evaluate.
Evaluators: Pick one or more evaluators for assessment.

Advanced Configuration

Additional settings allow you to adjust batching and retry parameters for LLM calls. This help mitigating rate limit errors from your LLM provider.

Advanced configuration options include:

Batch Size: Number of test cases to run concurrently in each batch (default: 10).
Retry Delay: Time to wait before retrying a failed call (default: 3s).
Max Retries: Maximum number of retry attempts for a failed call (default: 3).
Delay Between Batches: Pause duration between batch runs (default: 5s).

Analyzing Evaluation Results

The main view offers an aggregated summary of results. Each column displays the average score per evaluator for each variant/test set combination. You'll also see average latency, total cost, creation date, and evaluation status.

For a detailed view of an evaluation, click on a completed evaluation row.

The evaluation table columns show inputs, reference answers used by evaluators, LLM application output, evaluator results, cost, and latency.

Comparing Evaluations

Once evaluations are marked "completed," you can compare two or more evaluations from the same test set. Click the Compare button to access the Evaluation comparison view, where you can analyze outputs from multiple evaluations side by side.

Animation showing how to compare evaluations in Agenta

Running Evaluations​

Setting Up Evaluation Parameters​

Advanced Configuration​

Analyzing Evaluation Results​

Comparing Evaluations​

Running Evaluations

Setting Up Evaluation Parameters

Advanced Configuration

Analyzing Evaluation Results

Comparing Evaluations