Running Evaluations

This guide will show you how to run evaluations from the UI.

Prerequisites

Before you get started, make sure that you have created a test set and configured evaluators appropriate for your task.

Starting an evaluation

To start an evaluation, navigate to the Evaluations page and click the Start new evaluation button. A modal will appear, allowing you to setup the evaluation.

Setting up evaluation parameters

In the modal, specify the following:

Testset: Choose the testset(s) for your evaluation
Variants: Choose one or more variants to evaluate
Evaluators: Pick one or more evaluators for assessment

Advanced configuration

Additional settings allow you to adjust batching and retry parameters for LLM calls. This helps mitigate rate limit errors from your LLM provider.

Advanced configuration options include:

Batch Size: Number of test cases to run concurrently in each batch (default: 10)
Retry Delay: Time to wait before retrying a failed call (default: 3s)
Max Retries: Maximum number of retry attempts for a failed call (default: 3)
Delay Between Batches: Pause duration between batch runs (default: 5s)

Monitoring evaluation progress

Once you start an evaluation:

The evaluation will appear in the evaluations list
You'll see the status (Running, Completed, Failed)
Progress indicators show how many test cases have been processed
You can view partial results while the evaluation is running

Next steps

Learn how to view evaluation results
Understand how to compare evaluations
Try human evaluation for expert feedback

Prerequisites​

Starting an evaluation​

Setting up evaluation parameters​

Advanced configuration​

Monitoring evaluation progress​