Blog - Docs - Agenta

Prompt Versioning

January 31, 2024

We've introduced the feature to version prompts, allowing you to track changes made by the team and revert to previous versions. To view the change history of the configuration, click on the sign in the playground to access all previous versions.

New JSON Evaluator

January 30, 2024

We have added a new evaluator to match JSON fields and added the possiblity to use other columns in the test set other than the correct_answer column as the ground truth.

Improved error handling in evaluation

January 29, 2024

We have improved error handling in evaluation to return more information about the exact source of the error in the evaluation view.

Improvements:

Added the option in A/B testing human evaluation to mark both variants as correct
Improved loading state in Human Evaluation

Bring your own API key

January 25, 2024

Up until know, we required users to use our OpenAI API key when using cloud. Starting now, you can use your own API key for any new application you create.

Improved human evaluation workflow

January 24, 2024

Faster human evaluation workflow

We have updated the human evaluation table view to add annotation and correct answer columns.

Improvements:

Simplified the database migration process
Fixed environment variable injection to enable cloud users to use their own keys
Disabled import from endpoint in cloud due to security reasons
Improved query lookup speed for evaluation scenarios
Improved error handling in playground

Bug fixes:

Resolved failing Backend tests
Fixed a bug in rate limit configuration validation
Fixed issue with all aggregated results
Resolved issue with live results in A/B testing evaluation not updating

Revamping evaluation

January 22, 2024

We've spent the past month re-engineering our evaluation workflow. Here's what's new:

Running Evaluations

Simultaneous Evaluations: You can now run multiple evaluations for different app variants and evaluators concurrently.

Rate Limit Parameters: Specify these during evaluations and reattempts to ensure reliable results without exceeding open AI rate limits.

Reusable Evaluators: Configure evaluators such as similarity match, regex match, or AI critique and use them across multiple evaluations.

Evaluation Reports

Dashboard Improvements: We've upgraded our dashboard interface to better display evaluation results. You can now filter and sort results by evaluator, test set, and outcomes.

Comparative Analysis: Select multiple evaluation runs and view the results of various LLM applications side-by-side.

Adding Cost and Token Usage to the Playground

January 12, 2024

caution

This change requires you to pull the latest version of the agenta platform if you're using the self-serve version.

We've added a feature that allows you to compare the time taken by an LLM app, its cost, and track token usage, all in one place.

----#

Improving Side-by-side Comparison in the Playground

December 19, 2023

Enhanced the side-by-side comparison in the playground for better user experience

Resolved Batch Logic Issue in Evaluation

December 18, 2023

Resolved an issue with batch logic in evaluation (users can now run extensive evaluations)

Comprehensive Updates and Bug Fixes

December 12, 2023

Incorporated all chat turns to the chat set
Rectified self-hosting documentation
Introduced asynchronous support for applications
Added 'register_default' alias
Fixed a bug in the side-by-side feature