Skip to main content

Roadmap

What we shipped, what we are building next, and what we plan to build.

Last Shipped

New Evaluation Results Dashboard

Completely redesigned evaluation results dashboard with performance plots, side-by-side comparison, improved test cases view, focused detail view, configuration visibility, and run naming.

Deep URL Support for Sharable Links

URLs now include workspace context, making them shareable between team members. Fixed workspace bugs with page refresh and workspace selection.

Speed Improvements in the Playground

We improved the speed of the playground (creation of prompts, navigation, etc.) especially with hundreds of revisions.

Markdown support

PlaygroundObservability

You can view prompt and messages in markdown both in the playground and in the observability drawer.

Image Support in playground

You can now upload images to the playground and use them in your prompts.

LLamaIndex Integration

You can trace your calls from LlamaIndex in one line.

Endpoint to Capture User Feedback for Traces

You can now use the annotation API to add annotations (e.g. scores, feedback) to LLM responses traced in Agenta.

Tool Support in the Playground

You can now define and test tools in the playground. You can save tool definitions as part of your prompts.

Structured Output Support in the Playground

We support now structured output in the playground. You can now define and validate structured output formats and save them as part of your prompt.

In progress

Online Evaluation

Adding the ability to configure evaluators (llm-as-a-judge or custom) and run them automatically on new traces.

Programmatic Evaluation through the SDK

Until now evaluations were only available as managed by Agenta. We are now adding the ability to run evaluations programmatically through the SDK.

Filtering Traces by Annotation

We are adding the ability to filter traces by annotation. This is useful for finding traces with low scores or feedback.

Date Range Filtering in Metrics Dashboard

We are adding the ability to filter traces by date range in the metrics dashboard.

Planned

Improving Navigation between Test Sets in the Playground

We are making it easy to use and navigate in the playground with large test sets .

Appending Single Test Cases in the Playground

Using test cases from different test sets is not possible right now in the Playground. We are adding the ability to append a single test case to a test set.

Improving Test Set View

We are reworking the test set view to make it easier to visualize and edit test sets.

Prompt Caching in the SDK

We are adding the ability to cache prompts in the SDK.

Test Set Versioning

We are adding the ability to version test sets. This is useful for correctly comparing evaluation results.

Tagging Traces, Test Sets, Evaluations and Prompts

We are adding the ability to tag traces, test sets, evaluations and prompts. This is useful for organizing and filtering your data.

Support for built-in LLM Tools (e.g. web search) in the Playground

We are adding the ability to use built-in LLM tools (e.g. web search) in the playground.

Feature Requests

Upvote or comment on the features you care about or request a new feature.

Request a feature