Skip to main content

Running Evaluations

This guide will show you how to run evaluations from the UI.

Prerequisites

Before you get started, make sure that you have created a test set and configured evaluators appropriate for your task.

Starting an evaluation

To start an evaluation, navigate to the Evaluations page and click the Start new evaluation button. A modal will appear, allowing you to setup the evaluation.

Setting up evaluation parameters

In the modal, specify the following:

  • Testset: Choose the testset(s) for your evaluation
  • Variants: Choose one or more variants to evaluate
  • Evaluators: Pick one or more evaluators for assessment

Advanced configuration

Additional settings allow you to adjust batching and retry parameters for LLM calls. This helps mitigate rate limit errors from your LLM provider.

Advanced configuration options include:

  • Batch Size: Number of test cases to run concurrently in each batch (default: 10)
  • Retry Delay: Time to wait before retrying a failed call (default: 3s)
  • Max Retries: Maximum number of retry attempts for a failed call (default: 3)
  • Delay Between Batches: Pause duration between batch runs (default: 5s)

Monitoring evaluation progress

Once you start an evaluation:

  1. The evaluation will appear in the evaluations list
  2. You'll see the status (Running, Completed, Failed)
  3. Progress indicators show how many test cases have been processed
  4. You can view partial results while the evaluation is running

Next steps