Skip to main content

Changelog

New features, improvements, and fixes in Agenta.

v0.52.5

Speed Improvements in the Playground

We rewrote most of Agenta's frontend. You'll see much faster speeds when you create prompts or use the playground.

We also made many improvements and fixed bugs:

Improvements:

  • LLM-as-a-judge now uses double curly braces {{}} instead of single curly braces { and }. This matches how normal prompts work. Old LLM-as-a-judge prompts with single curly braces still work. We updated the LLM-as-a-judge playground to make editing prompts easier.

Self-hosting:

Bug fixes:

  • Fixed the custom workflow quick start tutorial and examples
  • Fixed SDK compatibility issues with Python 3.9
  • Fixed default filtering in observability dashboard
  • Fixed error handling in the evaluator playground
  • Fixed the Tracing SDK to allow instrumenting streaming responses and overriding OTEL environment variables
Read more →
v0.51.0

Multiple Metrics in Human Evaluation

We rebuilt the human evaluation workflow from scratch. Now you can set multiple evaluators and metrics and use them to score the outputs.

This lets you evaluate the same output on different metrics like relevance or completeness. You can also create binary, numerical scores, or even use strings for comments or expected answer.

Watch the video below and read the post for more details. Or check out the docs to learn how to use the new human evaluation workflow.

Read more →

DSPy Integration

We've added DSPy integration to Agenta. You can now trace and debug your DSPy applications with Agenta.

Read more →

Open-sourcing our Product Roadmap

We've made our product roadmap completely transparent and community-driven. You can now see exactly what we're building, what's shipped, and what's coming next. Plus vote on features that matter most to you.

Read more →
v0.50.5

Major Playground Improvements and Enhancements

We've made significant improvements to the playground. Key features include:

  • Improving the error handling in JSON editor for structured output
  • Preventing the JSON field order from being changed
  • Visual diff when committing changes
  • Markdown and text view toggle
  • Collapsible interface elements
  • Collapsible test cases for large sets
Read more →
v0.48.4

LlamaIndex Integration

We're excited to announce observability support for LlamaIndex applications.

If you're using LlamaIndex, you can now see detailed traces in Agenta to debug your application.

The integration is auto-instrumentation - just add one line of code and you'll start seeing all your LlamaIndex operations traced.

This helps when you need to understand what's happening inside your RAG pipeline, track performance bottlenecks, or debug issues in production.

We've put together a Jupyter notebook and tutorial to get you started. Links are in the comments.

Read more →
v0.45.0

Annotate Your LLM Response (preview)

One of the major feature requests we had was the ability to capture user feedback and annotations (e.g. scores) to LLM responses traced in Agenta.

Today we're previewing one of a family of features around this topic.

As of today you can use the annotation API to add annotations to LLM responses traced in Agenta.

This is useful to:

  • Collect user feedback on LLM responses
  • Run custom evaluation workflows
  • Measure application performance in real-time

Check out the how to annotate traces from API for more details. Or try our new tutorial (available as jupyter notebook) here.

Other stuff:

  • We have cut our migration process to take a couple of minutes instead of an hour.
Read more →
v0.43.1

Tool Support in the Playground

We released tool usage in the Agenta playground - a key feature for anyone building agents with LLMs.

Agents need tools to access external data, perform calculations, or call APIs.

Now you can:

  • Define tools directly in the playground using JSON schema
  • Test how your prompt generates tool calls in real-time
  • Preview how your agent handles tool responses
  • Verify tool call correctness with custom evaluators

The tool schema is saved with your prompt configuration, making integration easy when you fetch configs through the API.

Read more →
v0.43.0

Documentation Overhaul, New Models, and Platform Improvements

We've made significant improvements across Agenta with a major documentation overhaul, new model support, self-hosting enhancements, and UI improvements.

Revamped Prompt Engineering Documentation:

We've completely rewritten our prompt management and prompt engineering documentation.

Start exploring the new documentation in our updated Quick Start Guide.

New Model Support:

Our platform now supports several new LLM models:

  • Google's Gemini 2.5 Pro and Flash
  • Alibaba Cloud's Qwen 3
  • OpenAI's GPT-4.1

These models are available in both the playground and through the API.

Playground Enhancements:

We've added a draft state to the playground, providing a better editing experience. Changes are now clearly marked as drafts until committed.

Self-Hosting Improvements:

We've significantly simplified the self-hosting experience by changing how environment variables are handled in the frontend:

  • No more rebuilding images to change ports or domains
  • Dynamic configuration through environment variables at runtime

Check out our updated self-hosting documentation for details.

Bug Fixes and Optimizations:

  • Fixed OpenTelemetry integration edge cases
  • Resolved edge cases in the API that affected certain workflow configurations
  • Improved UI responsiveness and fixed minor visual inconsistencies
  • Added chat support in cloud
Read more →