Blog - Docs - Agenta

Miscellaneous Improvements

April 28, 2024

The total cost of an evaluation is now displayed in the evaluation table. This allows you to understand how much evaluations are costing you and track your expenses.

Bug Fixes

Fixed sidebar focus in automatic evaluation results view
Fix the incorrect URLs shown when running agenta variant serve

Evaluation Speed Increase and Numerous Quality of Life Improvements

April 23, 2024

v0.13.1-5

We've improved the speed of evaluations by 3x through the use of asynchronous batching of calls.
We've added Groq as a new provider along with Llama3 to our playground.

Bug Fixes

Resolved a rendering UI bug in Testset view.
Fixed incorrect URLs displayed when running the 'agenta variant serve' command.
Corrected timestamps in the configuration.
Resolved errors when using the chat template with empty input.
Fixed latency format in evaluation view.
Added a spinner to the Human Evaluation results table.
Resolved an issue where the gitignore was being overwritten when running 'agenta init'.

Observability (beta)

April 14, 2024

You can now monitor your application usage in production. We've added a new observability feature (currently in beta), which allows you to:

Monitor cost, latency, and the number of calls to your applications in real-time.
View the logs of your LLM calls, including inputs, outputs, and used configurations. You can also add any interesting logs to your test set.
Trace your more complex LLM applications to understand the logic within and debug it.

As of now, all new applications created will include observability by default. We are working towards a GA version in the next weeks, which will be scalable and better integrated with your applications. We will also be adding tutorials and documentation about it.

Find examples of LLM apps created from code with observability here.

Compare latency and costs

April 1, 2024

You can now compare the latency and cost of different variants in the evaluation view.

Minor improvements

March 31, 2024

Toggle variants in comparison view

You can now toggle the visibility of variants in the comparison view, allowing you to compare a multitude of variants side-by-side at the same time.

Improvements

You can now add a datapoint from the playground to the test set even if there is a column mismatch

Bug fixes

Resolved issue with "Start Evaluation" button in Testset view
Fixed bug in CLI causing variant not to serve

New evaluators

March 25, 2024

We have added some more evaluators, a new string matching and a Levenshtein distance evaluation.

Improvements

Updated documentation for human evaluation
Made improvements to Human evaluation card view
Added dialog to indicate testset being saved in UI

Bug fixes

Fixed issue with viewing the full output value during evaluation
Enhanced error boundary logic to unblock user interface
Improved logic to save and retrieve multiple LLM provider keys
Fixed Modal instances to support dark mode

Minor improvements

March 11, 2024

Improved the logic of the Webhook evaluator
Made the inputs in the Human evaluation view non-editable
Added an option to save a test set in the Single model evaluation view
Included the evaluator name in the "Configure your evaluator" modal

Bug fixes

Fixed column resize in comparison view
Resolved a bug affecting the evaluation output in the CSV file
Corrected the path to the Evaluators view when navigating from Evaluations

Highlight ouput difference when comparing evaluations

March 4, 2024

We have improved the evaluation comparison view to show the difference to the expected output.

Improvements

Improved the error messages when invoking LLM applications
Improved "Add new evaluation" modal
Upgraded Sidemenu to display Configure evaluator and run evaluator under Evaluations section
Changed cursor to pointer when hovering over evaluation results

Deployment Versioning and RBAC

February 14, 2024

Deployment versioning

You now have access to a history of prompts deployed to our three environments. This feature allows you to roll back to previous versions if needed.

Role-Based Access Control

You can now invite team members and assign them fine-grained roles in agenta.

Improvements

We now prevent the deletion of test sets that are used in evaluations

Bug fixes

Fixed bug in custom code evaluation aggregation. Up until know the aggregated result for custom code evalution where not computed correctly.
Fixed bug with Evaluation results not being exported correctly
Updated documentation for vision gpt explain images
Improved Frontend test for Evaluations

Minor fixes

February 4, 2024

Addressed issue when invoking LLM app with missing LLM provider key
Updated LLM providers in Backend enum
Fixed bug in variant environment deployment
Fixed the sorting in evaluation tables
Made use of server timezone instead of UTC