Changelog

New features, improvements, and fixes in Agenta.

19 September 2025v0.52.5

Speed Improvements in the Playground

We rewrote most of Agenta's frontend. You'll see much faster speeds when you create prompts or use the playground.

We also made many improvements and fixed bugs:

Improvements:

LLM-as-a-judge now uses double curly braces {{}} instead of single curly braces { and }. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.

Self-hosting:

You can now use an external Redis instance for caching by setting it as an environment variable

Bug fixes:

Fixed the custom workflow quick start tutorial and examples
Fixed SDK compatibility issues with Python 3.9
Fixed default filtering in observability dashboard
Fixed error handling in the evaluator playground
Fixed the Tracing SDK to allow instrumenting streaming responses and overriding OTEL environment variables

9 September 2025v0.51.0

Multiple Metrics in Human Evaluation

We rebuilt the human evaluation workflow from scratch. Now you can set multiple evaluators and metrics and use them to score the outputs.

This lets you evaluate the same output on different metrics like relevance or completeness. You can also create binary, numerical scores, or even use strings for comments or expected answer.

Watch the video below and read the post for more details. Or check out the docs to learn how to use the new human evaluation workflow.

29 August 2025

DSPy Integration

We've added DSPy integration to Agenta. You can now trace and debug your DSPy applications with Agenta.

12 August 2025

Open-sourcing our Product Roadmap

We've made our product roadmap completely transparent and community-driven. You can now see exactly what we're building, what's shipped, and what's coming next. Plus vote on features that matter most to you.

7 August 2025v0.50.5

Major Playground Improvements and Enhancements

We've made significant improvements to the playground. Key features include:

Improving the error handling in JSON editor for structured output
Preventing the JSON field order from being changed
Visual diff when committing changes
Markdown and text view toggle
Collapsible interface elements
Collapsible test cases for large sets

29 July 2025v0.50.0

Support for Images in the Playground

Agenta now supports images in the playground, test sets, and evaluations. Click above for more details.

17 June 2025v0.48.4

LlamaIndex Integration

We're excited to announce observability support for LlamaIndex applications.

If you're using LlamaIndex, you can now see detailed traces in Agenta to debug your application.

The integration is auto-instrumentation - just add one line of code and you'll start seeing all your LlamaIndex operations traced.

This helps when you need to understand what's happening inside your RAG pipeline, track performance bottlenecks, or debug issues in production.

We've put together a Jupyter notebook and tutorial to get you started. Links are in the comments.

15 May 2025v0.45.0

Annotate Your LLM Response (preview)

One of the major feature requests we had was the ability to capture user feedback and annotations (e.g. scores) to LLM responses traced in Agenta.

Today we're previewing one of a family of features around this topic.

As of today you can use the annotation API to add annotations to LLM responses traced in Agenta.

This is useful to:

Collect user feedback on LLM responses
Run custom evaluation workflows
Measure application performance in real-time

Check out the how to annotate traces from API for more details. Or try our new tutorial (available as jupyter notebook) here.

Other stuff:

We have cut our migration process to take a couple of minutes instead of an hour.

10 May 2025v0.43.1

Tool Support in the Playground

We released tool usage in the Agenta playground - a key feature for anyone building agents with LLMs.

Agents need tools to access external data, perform calculations, or call APIs.

Now you can:

Define tools directly in the playground using JSON schema
Test how your prompt generates tool calls in real-time
Preview how your agent handles tool responses
Verify tool call correctness with custom evaluators

The tool schema is saved with your prompt configuration, making integration easy when you fetch configs through the API.

2 May 2025v0.43.0

Documentation Overhaul, New Models, and Platform Improvements

We've made significant improvements across Agenta with a major documentation overhaul, new model support, self-hosting enhancements, and UI improvements.

Revamped Prompt Engineering Documentation:

We've completely rewritten our prompt management and prompt engineering documentation.

Start exploring the new documentation in our updated Quick Start Guide.

New Model Support:

Our platform now supports several new LLM models:

Google's Gemini 2.5 Pro and Flash
Alibaba Cloud's Qwen 3
OpenAI's GPT-4.1

These models are available in both the playground and through the API.

Playground Enhancements:

We've added a draft state to the playground, providing a better editing experience. Changes are now clearly marked as drafts until committed.

Self-Hosting Improvements:

We've significantly simplified the self-hosting experience by changing how environment variables are handled in the frontend:

No more rebuilding images to change ports or domains
Dynamic configuration through environment variables at runtime

Check out our updated self-hosting documentation for details.

Bug Fixes and Optimizations:

Fixed OpenTelemetry integration edge cases
Resolved edge cases in the API that affected certain workflow configurations
Improved UI responsiveness and fixed minor visual inconsistencies
Added chat support in cloud