Skip to main content

Changelog

New features, improvements, and fixes in Agenta.

v0.81.0

Navigation Links from Traces to App/Environment/Variant

You can now click directly from any trace to the application, variant, or environment that generated it. Links appear in both the trace table and drawer view. This makes debugging faster since you can jump straight to the configuration that produced a specific output.

To enable navigation links, store references in your traces using the Python SDK (ag.tracing.store_refs()) or OpenTelemetry span attributes. See the reference prompt versions guide for details.

Read more →
v0.74.0

Test Set Versioning and New Test Set UI

Test sets now have versioning. Every edit, upload, or programmatic update creates a new version. Evaluations link to specific versions, so you can compare results knowing they used the same test data.

The test set UI is completely rebuilt. It handles hundreds of thousands of rows without slowing down. Editing is much easier, especially for chat messages. You can view and edit complex JSON directly, toggle between raw and formatted views, and choose whether columns store strings or JSON.

Read more →
v0.73.0

Playground UX Improvements

Three quality-of-life improvements to the Playground: You can now see provider costs per million tokens directly in the model selection dropdown. You can run evaluations directly from the Playground without navigating to the evaluation menu. And you can collapse test cases to navigate large test sets more easily.

Read more →
v0.73.0

Chat Sessions in Observability

You can now track multi-turn conversations with chat sessions. All traces with the same session ID are automatically grouped together, letting you analyze complete conversations instead of individual requests.

The new session browser shows key metrics like total cost, latency, and token usage per conversation. Open any session to see all traces with their parent-child relationships. This makes debugging chatbots and AI assistants much easier. Add session tracking with one line of code using either our Python SDK or OpenTelemetry.

Minor improvements:

  • Added time filtering to the analytics dashboard. You can now view metrics for the last 6 hours, 24 hours, 7 days, or 30 days.
  • Added the ability to batch delete multiple traces at once. Select traces using checkboxes and delete them in a single operation.
Read more →
v0.73.0

JSON Multi-Field Match Evaluator

The new JSON Multi-Field Match evaluator validates multiple fields between JSON objects. Configure any number of field paths using dot notation, JSON Path, or JSON Pointer formats. Each field gets its own score (0 or 1), and an aggregate score shows the percentage of matching fields. This evaluator is ideal for entity extraction tasks like validating extracted names, emails, and addresses. The UI automatically detects fields from your test data for quick setup. This replaces the old JSON Field Match evaluator, which only supported single fields.

Read more →
v0.69.0

PDF Support in the Playground

The Playground now supports PDF attachments for chat applications. You can attach PDFs by uploading files, providing URLs, or using file IDs from provider APIs. This works with vision-capable models and extends to evaluations and observability. You can now build and test document processing applications like invoice analysis or contract review.

Read more →
v0.68.3

Agenta Documentation MCP Server

AI coding agents like Cursor, Claude Code, and VS Code Copilot can now access Agenta documentation directly through the Agenta MCP server. Connect your IDE to get instant answers about Agenta features, APIs, and code examples without leaving your editor. The server supports multiple clients and requires no authentication.

Read more →
v0.65.0

Projects within Organizations

You can now create projects within an organization. This lets you divide your work between different AI products. Each project scopes its prompts, traces, and evaluations. Create a new project or navigate between projects directly from the sidebar.

Read more →
v0.66.0

Provider Built-in Tools in the Playground

You can now use provider built-in tools in the Playground. Add web search, code execution, file search, and Bash scripting tools directly to your prompts. Supported providers include OpenAI, Anthropic, and Gemini. Tools are saved with your prompt configuration and automatically used when you invoke prompts through the LLM gateway.

Read more →
v0.62.5

Reasoning Effort Support in the Playground

You can now configure reasoning effort for models that support this parameter, such as OpenAI's o1 series and Google's Gemini 2.5 Pro. The reasoning effort setting is part of your prompt template, making it available when you fetch prompts via the SDK or invoke them through Agenta as an LLM gateway.

Read more →