Roadmap
What we shipped, what we are building next, and what we plan to build.
Last Shipped
Markdown support
8/7/2025
Playground Observability
You can view prompt and messages in markdown both in the playground and in the observability drawer.
Image Support in playground
7/29/2025
Playground
You can now upload images to the playground and use them in your prompts.
LLamaIndex Integration
6/17/2025
Observability
You can trace your calls from LlamaIndex in one line.
Endpoint to Capture User Feedback for Traces
5/15/2025
Observability
You can now use the annotation API to add annotations (e.g. scores, feedback) to LLM responses traced in Agenta.
Tool Support in the Playground
5/10/2025
Playground
You can now define and test tools in the playground. You can save tool definitions as part of your prompts.
Structured Output Support in the Playground
4/15/2025
Playground
We support now structured output in the playground. You can now define and validate structured output formats and save them as part of your prompt.
In progress
Evaluation Results Dashboard
Evaluation
We are reworking the evaluation results page to make it more user friendly and informative.
Online Evaluation
Evaluation
Adding the ability to configure evaluators (llm-as-a-judge or custom) and run them automatically on new traces.
Programmatic Evaluation through the SDK
Evaluation
Until now evaluations were only available as managed by Agenta. We are now adding the ability to run evaluations programmatically through the SDK.
Speed Improvements in the Playground
Playground
We are improving the speed of the playground especially with hundreds of revisions.
Bug Fixes in the Workspace
Misc
Fixing workspace bugs, especially with page refresh, workspace selection, and sharing workspace links.
Filtering Traces by Annotation
Observability
We are adding the ability to filter traces by annotation. This is useful for finding traces with low scores or feedback.
Planned
Improving Navigation between Test Sets in the Playground
Playground
We are making it easy to use and navigate in the playground with large test sets .
Appending Single Test Cases in the Playground
Playground
Using test cases from different test sets is not possible right now in the Playground. We are adding the ability to append a single test case to a test set.
Improving Test Set View
Evaluation
We are reworking the test set view to make it easier to visualize and edit test sets.
Prompt Caching in the SDK
SDK
We are adding the ability to cache prompts in the SDK.
Test Set Versioning
Evaluation
We are adding the ability to version test sets. This is useful for correctly comparing evaluation results.
Tagging Traces, Test Sets, Evaluations and Prompts
Evaluation
We are adding the ability to tag traces, test sets, evaluations and prompts. This is useful for organizing and filtering your data.
Support for built-in LLM Tools (e.g. web search) in the Playground
Playground
We are adding the ability to use built-in LLM tools (e.g. web search) in the playground.
Feature Requests
Upvote or comment on the features you care about or request a new feature.