Viewing Results
Overview
After completing annotations, you can review and export the results of your human evaluation.
Viewing results
The Results section shows:
- Aggregated scores across all test cases
- Individual annotations for each test case
- Evaluator performance metrics
- Comments and feedback provided by annotators
Comparing with other experiments
You can compare human evaluation results with:
- Automated evaluation runs
- Other human evaluation sessions
- Different variants or versions
This helps you understand how human judgment aligns with automated metrics and identify areas for improvement.
Exporting results
Export to CSV
Click Export results to download your evaluation data in CSV format. The exported file includes:
- Test case inputs
- LLM outputs
- All annotation scores and feedback
- Timestamp and annotator information
Saving as test set
Click Save test set to create a new test set from annotated data. This is useful for:
- Bootstrapping automated evaluation with human-validated examples
- Creating regression test suites
- Building training data for custom evaluators
Use cases for exported data
- Analysis: Perform statistical analysis on evaluation results
- Reporting: Create reports for stakeholders
- Training: Use annotations to train or fine-tune models
- Quality Assurance: Track quality metrics over time
Next steps
- Learn about A/B testing
- Explore automated evaluation
- Understand configure evaluators