25 May 2024

New LLM Provider: Welcome Gemini!

We are excited to announce the addition of Google’s Gemini to our list of supported LLM providers, bringing the total number to 12.


24 May 2024

Playground Improvements

  • We’ve improved the workflow for adding outputs to a dataset in the playground. In the past, you had to select the name of the test set each time. Now, the last used test set is selected by default..
  • We have significantly improved the debugging experience when creating applications from code. Now, if an application fails, you can view the logs to understand the reason behind the failure.
  • We moved the copy message button in the playground to the output text area.
  • We now hide the cost and usage in the playground when they aren’t specified
  • We’ve made improvements to error messages in the playground

Bug Fixes

  • Fixed the order of the arguments when running a custom code evaluator
  • Fixed the timestamp in the Testset view (previous stamps was droppping the trailing 0)
  • Fixed the creation of application from code in the self-hosted version when using Windows


1 May 2024

Prompt and Configuration Registry

We’ve introduced a feature that allows you to use Agenta as a prompt registry or management system. In the deployment view, we now provide an endpoint to directly fetch the latest version of your prompt. Here is how it looks like:

from agenta import Agenta
agenta = Agenta() 
config = agenta.get_config(base_id="xxxxx", environment="production", cache_timeout=200) # Fetches the configuration with caching

You can find additional documentation here.


  • Previously, publishing a variant from the playground to an environment was a manual process., from now on we are publishing by default to the production environment.


28 April 2024


  • The total cost of an evaluation is now displayed in the evaluation table. This allows you to understand how much evaluations are costing you and track your expenses.

Bug Fixes

  • Fixed sidebar focus in automatic evaluation results view
  • Fix the incorrect URLs shown when running agenta variant serve

v0.13.1-5 - Evaluation Speed Increase and Numerous Quality of Life Improvements

23rd April 2024

  • We’ve improved the speed of evaluations by 3x through the use of asynchronous batching of calls.
  • We’ve added Groq as a new provider along with Llama3 to our playground.

Bug Fixes

  • Resolved a rendering UI bug in Testset view.
  • Fixed incorrect URLs displayed when running the ‘agenta variant serve’ command.
  • Corrected timestamps in the configuration.
  • Resolved errors when using the chat template with empty input.
  • Fixed latency format in evaluation view.
  • Added a spinner to the Human Evaluation results table.
  • Resolved an issue where the gitignore was being overwritten when running ‘agenta init’.

v0.13.0 - Observability (beta)

14th April 2024

You can now monitor your application usage in production. We’ve added a new observability feature (currently in beta), which allows you to:

  • Monitor cost, latency, and the number of calls to your applications in real-time.
  • View the logs of your LLM calls, including inputs, outputs, and used configurations. You can also add any interesting logs to your test set.
  • Trace your more complex LLM applications to understand the logic within and debug it.

As of now, all new applications created will include observability by default. We are working towards a GA version in the next weeks, which will be scalable and better integrated with your applications. We will also be adding tutorials and documentation about it.

Find examples of LLM apps created from code with observability here.

v0.12.6 - Compare latency and costs

1st April 2024

You can now compare the latency and cost of different variants in the evaluation view.

v0.12.5 - Minor improvements

31st March 2024

Toggle variants in comparison view

You can now toggle the visibility of variants in the comparison view, allowing you to compare a multitude of variants side-by-side at the same time.


  • You can now add a datapoint from the playground to the test set even if there is a column mismatch

Bug fixes

  • Resolved issue with “Start Evaluation” button in Testset view
  • Fixed bug in CLI causing variant not to serve

v0.12.4 - New evaluators

25th March 2024

We have added some more evaluators, a new string matching and a Levenshtein distance evaluation.


  • Updated documentation for human evaluation
  • Made improvements to Human evaluation card view
  • Added dialog to indicate testset being saved in UI

Bug fixes

  • Fixed issue with viewing the full output value during evaluation
  • Enhanced error boundary logic to unblock user interface
  • Improved logic to save and retrieve multiple LLM provider keys
  • Fixed Modal instances to support dark mode

v0.12.3 - Minor improvements

11th March 2024

  • Improved the logic of the Webhook evaluator
  • Made the inputs in the Human evaluation view non-editable
  • Added an option to save a test set in the Single model evaluation view
  • Included the evaluator name in the “Configure your evaluator” modal

Bug fixes

  • Fixed column resize in comparison view
  • Resolved a bug affecting the evaluation output in the CSV file
  • Corrected the path to the Evaluators view when navigating from Evaluations

v0.12.2 - Highlight ouput difference when comparing evaluations

4th March 2024

We have improved the evaluation comparison view to show the difference to the expected output.


  • Improved the error messages when invoking LLM applications
  • Improved “Add new evaluation” modal
  • Upgraded Sidemenu to display Configure evaluator and run evaluator under Evaluations section
  • Changed cursor to pointer when hovering over evaluation results

v0.12.0 - Deployment Versioning and RBAC

14th February 2024

Deployment versioning

You now have access to a history of prompts deployed to our three environments. This feature allows you to roll back to previous versions if needed.

Role-Based Access Control

You can now invite team members and assign them fine-grained roles in agenta.


  • We now prevent the deletion of test sets that are used in evaluations

Bug fixes

  • Fixed bug in custom code evaluation aggregation. Up until know the aggregated result for custom code evalution where not computed correctly.

  • Fixed bug with Evaluation results not being exported correctly

  • Updated documentation for vision gpt explain images

  • Improved Frontend test for Evaluations

v0.10.2- Minor fixes

4th February 2024

  • Addressed issue when invoking LLM app with missing LLM provider key
  • Updated LLM providers in Backend enum
  • Fixed bug in variant environment deployment
  • Fixed the sorting in evaluation tables
  • Made use of server timezone instead of UTC

v0.10.0 - Prompt Versioning

31st January 2024

We’ve introduced the feature to version prompts, allowing you to track changes made by the team and revert to previous versions. To view the change history of the configuration, click on the sign in the playground to access all previous versions.

v0.9.1 - New JSON Evaluator

30th January 2024

We have added a new evaluator to match JSON fields and added the possiblity to use other columns in the test set other than the correct_answer column as the ground truth.

v0.9.0 - Improved error handling in evaluation

29th January 2024

We have improved error handling in evaluation to return more information about the exact source of the error in the evaluation view.


  • Added the option in A/B testing human evaluation to mark both variants as correct
  • Improved loading state in Human Evaluation

v0.8.3 - Bring your own API key

25th January 2024

Up until know, we required users to use our OpenAI API key when using cloud. Starting now, you can use your own API key for any new application you create.

v0.8.2 - Improved human evaluation workflow

24th January 2024

Faster human evaluation workflow

We have updated the human evaluation table view to add annotation and correct answer columns.


  • Simplified the database migration process
  • Fixed environment variable injection to enable cloud users to use their own keys
  • Disabled import from endpoint in cloud due to security reasons
  • Improved query lookup speed for evaluation scenarios
  • Improved error handling in playground

Bug fixes:

  • Resolved failing Backend tests
  • Fixed a bug in rate limit configuration validation
  • Fixed issue with all aggregated results
  • Resolved issue with live results in A/B testing evaluation not updating

v0.8.0 - Revamping evaluation

22nd January 2024

We’ve spent the past month re-engineering our evaluation workflow. Here’s what’s new:

Running Evaluations

  1. Simultaneous Evaluations: You can now run multiple evaluations for different app variants and evaluators concurrently.
  1. Rate Limit Parameters: Specify these during evaluations and reattempts to ensure reliable results without exceeding open AI rate limits.
  1. Reusable Evaluators: Configure evaluators such as similarity match, regex match, or AI critique and use them across multiple evaluations.

Evaluation Reports

  1. Dashboard Improvements: We’ve upgraded our dashboard interface to better display evaluation results. You can now filter and sort results by evaluator, test set, and outcomes.
  1. Comparative Analysis: Select multiple evaluation runs and view the results of various LLM applications side-by-side.

v0.7.1 - Adding Cost and Token Usage to the Playground

12th January 2024

This change requires you to pull the latest version of the agenta platform if you’re using the self-serve version.

We’ve added a feature that allows you to compare the time taken by an LLM app, its cost, and track token usage, all in one place.

Changes to the SDK

This necessitated modifications to the SDK. Now, the LLM application API returns a JSON instead of a string. The JSON includes the output message, usage details, and cost:

 "message": string,
 "usage": {
  "prompt_tokens": int,
  "completion_tokens": int,
  "total_tokens": int
 "cost": float

v0.6.6 - Improving Side-by-side Comparison in the Playground

19th December 2023

  • Enhanced the side-by-side comparison in the playground for better user experience

v0.6.5 - Resolved Batch Logic Issue in Evaluation

18th December 2023

  • Resolved an issue with batch logic in evaluation (users can now run extensive evaluations)

v0.6.4 - Comprehensive Updates and Bug Fixes

12th December 2023

  • Incorporated all chat turns to the chat set
  • Rectified self-hosting documentation
  • Introduced asynchronous support for applications
  • Added ‘register_default’ alias
  • Fixed a bug in the side-by-side feature

v0.6.3 - Integrated File Input and UI Enhancements

12th December 2023

  • Integrated file input feature in the SDK
  • Provided an example that includes images
  • Upgraded the human evaluation view to present larger inputs
  • Fixed issues related to data overwriting in the cloud
  • Implemented UI enhancements to the side bar

v0.6.2 - Minor Adjustments for Better Performance

7th December 2023

  • Made minor adjustments

v0.6.1 - Bug Fix for Application Saving

7th December 2023

  • Resolved a bug related to saving the application

v0.6.0 - Introduction of Chat-based Applications

1st December 2023

  • Introduced chat-based applications
  • Fixed a bug in ‘export csv’ feature in auto evaluation

v0.5.8 - Multiple UI and CSV Reader Fixes

1st December 2023

  • Fixed a bug impacting the csv reader
  • Addressed an issue of variant overwriting
  • Made tabs draggable for better UI navigation
  • Implemented support for multiple LLM keys in the UI

v0.5.7 - Enhanced Self-hosting and Mistral Model Tutorial

17th November 2023

  • Enhanced and simplified self-hosting feature
  • Added a tutorial for the Mistral model
  • Resolved a race condition issue in deployment
  • Fixed an issue with saving in the playground

v0.5.6 - Sentry Integration and User Communication Improvements

12th November 2023

  • Enhanced bug tracking with Sentry integration in the cloud
  • Integrated Intercom for better user communication in the cloud
  • Upgraded to the latest version of OpenAI
  • Cleaned up files post serving in CLI

v0.5.5 - Cypress Tests and UI Improvements

2nd November 2023

  • Conducted extensive Cypress tests for improved application stability
  • Added a collapsible sidebar for better navigation
  • Improved error handling mechanisms
  • Added documentation for the evaluation feature

v0.5 - Launch of SDK Version 2 and Cloud-hosted Version

23rd October 2023

  • Launched SDK version 2
  • Launched the cloud-hosted version
  • Completed a comprehensive refactoring of the application

