Roadmap
What we shipped, what we are building next, and what we plan to build.
Last Shipped
New Evaluation Results Dashboard
9/26/2025
Evaluation
Completely redesigned evaluation results dashboard with performance plots, side-by-side comparison, improved test cases view, focused detail view, configuration visibility, and run naming.
Deep URL Support for Sharable Links
9/24/2025
Misc
URLs now include workspace context, making them shareable between team members. Fixed workspace bugs with page refresh and workspace selection.
Speed Improvements in the Playground
9/19/2025
Playground
We improved the speed of the playground (creation of prompts, navigation, etc.) especially with hundreds of revisions.
Markdown support
8/7/2025
Playground Observability
You can view prompt and messages in markdown both in the playground and in the observability drawer.
Image Support in playground
7/29/2025
Playground
You can now upload images to the playground and use them in your prompts.
LLamaIndex Integration
6/17/2025
Observability
You can trace your calls from LlamaIndex in one line.
Endpoint to Capture User Feedback for Traces
5/15/2025
Observability
You can now use the annotation API to add annotations (e.g. scores, feedback) to LLM responses traced in Agenta.
Tool Support in the Playground
5/10/2025
Playground
You can now define and test tools in the playground. You can save tool definitions as part of your prompts.
Structured Output Support in the Playground
4/15/2025
Playground
We support now structured output in the playground. You can now define and validate structured output formats and save them as part of your prompt.
In progress
Online Evaluation
Evaluation
Adding the ability to configure evaluators (llm-as-a-judge or custom) and run them automatically on new traces.
Programmatic Evaluation through the SDK
Evaluation
Until now evaluations were only available as managed by Agenta. We are now adding the ability to run evaluations programmatically through the SDK.
Filtering Traces by Annotation
Observability
We are adding the ability to filter traces by annotation. This is useful for finding traces with low scores or feedback.
Date Range Filtering in Metrics Dashboard
Observability
We are adding the ability to filter traces by date range in the metrics dashboard.
Planned
Improving Navigation between Test Sets in the Playground
Playground
We are making it easy to use and navigate in the playground with large test sets .
Appending Single Test Cases in the Playground
Playground
Using test cases from different test sets is not possible right now in the Playground. We are adding the ability to append a single test case to a test set.
Improving Test Set View
Evaluation
We are reworking the test set view to make it easier to visualize and edit test sets.
Prompt Caching in the SDK
SDK
We are adding the ability to cache prompts in the SDK.
Test Set Versioning
Evaluation
We are adding the ability to version test sets. This is useful for correctly comparing evaluation results.
Tagging Traces, Test Sets, Evaluations and Prompts
Evaluation
We are adding the ability to tag traces, test sets, evaluations and prompts. This is useful for organizing and filtering your data.
Support for built-in LLM Tools (e.g. web search) in the Playground
Playground
We are adding the ability to use built-in LLM tools (e.g. web search) in the playground.
Feature Requests
Upvote or comment on the features you care about or request a new feature.