Skip to main content

Viewing Results

Overview

After completing annotations, you can review and export the results of your human evaluation.

Viewing results

The Results section shows:

  • Aggregated scores across all test cases
  • Individual annotations for each test case
  • Evaluator performance metrics
  • Comments and feedback provided by annotators

Comparing with other experiments

You can compare human evaluation results with:

  • Automated evaluation runs
  • Other human evaluation sessions
  • Different variants or versions

This helps you understand how human judgment aligns with automated metrics and identify areas for improvement.

Exporting results

Export to CSV

Click Export results to download your evaluation data in CSV format. The exported file includes:

  • Test case inputs
  • LLM outputs
  • All annotation scores and feedback
  • Timestamp and annotator information

Saving as test set

Click Save test set to create a new test set from annotated data. This is useful for:

  • Bootstrapping automated evaluation with human-validated examples
  • Creating regression test suites
  • Building training data for custom evaluators

Use cases for exported data

  • Analysis: Perform statistical analysis on evaluation results
  • Reporting: Create reports for stakeholders
  • Training: Use annotations to train or fine-tune models
  • Quality Assurance: Track quality metrics over time

Next steps