Running Evaluations
This guide will show you how to run evaluations from the UI.
Prerequisites
Before you get started, make sure that you have created a test set and configured evaluators appropriate for your task.
Starting an evaluation
To start an evaluation, navigate to the Evaluations page and click the Start new evaluation button. A modal will appear, allowing you to setup the evaluation.
Setting up evaluation parameters
In the modal, specify the following:
- Testset: Choose the testset(s) for your evaluation
- Variants: Choose one or more variants to evaluate
- Evaluators: Pick one or more evaluators for assessment
Advanced configuration
Additional settings allow you to adjust batching and retry parameters for LLM calls. This helps mitigate rate limit errors from your LLM provider.
Advanced configuration options include:
- Batch Size: Number of test cases to run concurrently in each batch (default: 10)
- Retry Delay: Time to wait before retrying a failed call (default: 3s)
- Max Retries: Maximum number of retry attempts for a failed call (default: 3)
- Delay Between Batches: Pause duration between batch runs (default: 5s)
Monitoring evaluation progress
Once you start an evaluation:
- The evaluation will appear in the evaluations list
- You'll see the status (Running, Completed, Failed)
- Progress indicators show how many test cases have been processed
- You can view partial results while the evaluation is running
Next steps
- Learn how to view evaluation results
- Understand how to compare evaluations
- Try human evaluation for expert feedback