Insights

Marketplace

Solutions

Pricing

Docs

Blog

Community

GitHub

Get Started

Dify x Arize: How to Evaluate, Monitor, and Improve Agents

At Dify, we’re committed to helping teams build reliable AI applications fast. To make it easier to evaluate, monitor, and improve agents, we’ve partnered with Arize to bring their observability tools—Phoenix and AX—into the Dify ecosystem.

Dify

Arize

Written on

Sep 4, 2025

Share to Twitter

Share to LinkedIn

Share to Hacker News

Product

Sep 4, 2025

Dify x Arize: How to Evaluate, Monitor, and Improve Agents

Dify

Arize

Share to Twitter

Share to LinkedIn

Share to Hacker News

Product

Dify x Arize: How to Evaluate, Monitor, and Improve Agents

Dify

Arize

Written on

Sep 4, 2025

Share to Twitter

Share to LinkedIn

Share to Hacker News

Product

Sep 4, 2025

Dify x Arize: How to Evaluate, Monitor, and Improve Agents

Share to Twitter

Share to LinkedIn

Share to Hacker News

Product

Sep 4, 2025

Dify x Arize: How to Evaluate, Monitor, and Improve Agents

Share to Twitter

Share to LinkedIn

Share to Hacker News

Introduction

Dify is an open source, model agnostic platform for agentic AI. It unifies visual workflows, a full RAG Knowledge Pipeline, and LLMOps so teams ship production ready agents fast, in self hosted or cloud environments.

But speed alone isn’t enough—you need to know your applications are sound. As your AI applications and agents get more complex, keeping them accurate and efficient becomes a real challenge.

By leveraging observability features, you can begin to answer questions like:

Are your agents taking the most efficient paths?
Is your chosen model the right one in terms of token usage, latency, and cost?
How well are your retrieval steps contributing to output quality?

Observability isn’t just a production concern; it’s also essential during development, helping you catch silent errors, monitor costs, and understand LLM and agent behavior before issues reach your users. And once your application is live, that same visibility ensures it stays reliable as you scale.

This is where products like Arize Phoenix and Arize AX come in, giving you one-click observability, performance insights, experimentation tools, and evaluation pipelines that let you bring Dify applications into production with confidence.

Observability shouldn’t slow you down. It should be as seamless as building your drag-and-drop workflow.

Arize Phoenix & Dify

When you’re building with Dify, you get the flexibility to spin up LLM-powered workflows in no time. But as your agents get more complex, keeping them accurate and efficient becomes a real challenge. Tracing weird behaviors, debugging failures, and actually improving quality (instead of just hoping it’s better) starts to matter. That’s where Phoenix steps in.

Arize-Phoenix is your open-source observability layer for LLM apps, plugging right into your Dify workflows so you can actually see what your agents are doing. Every model call, tool invocation, and chain step your agents execute gets traced automatically, so you’re not left guessing why a prompt tweak worked—or made things worse. Inputs, outputs, latencies, and metadata all show up, making it easy to debug and optimize without hunting through logs.

Phoenix goes beyond just tracing. It lets you annotate your collected traces, build structured test datasets, create tailored evaluations, and run tests to measure exactly how your agents are performing before you ship changes. That way, you can keep moving fast with Dify while staying confident that your workflows aren’t silently breaking.

Phoenix + Dify: Sample Use Case for Improving Your Agents

Configure your Dify application with Phoenix
In Dify’s monitoring tab, drop in your Phoenix credentials, and tracing is good to go.
Collect traces
Run your Dify agent as usual, and Phoenix will automatically capture structured traces for every conversation and task.
Build a dataset for evaluation
Hop into Phoenix, grab traces that capture key user flows, tricky edge cases, and examples where your agent struggles. Save these examples as a dataset so you can use it as a reference point to evaluate performance changes over time.
Iterate and Experiment
Use Phoenix’s LLM span replay and prompt playground to test prompt tweaks and model changes against your dataset. Compare outputs side by side to see how your changes affect results on real examples.
Define and Run Evaluators
Set up and run evaluators—such as correctness, helpfulness, or relevance checks—on your experiment results. Your original dataset serves as a reference point, helping you see where changes improve outputs on previously failing cases and flag regressions on examples that were working before.
Deploy with confidence
Update your Dify application with tested changes. Keep tracing, evaluating, and refining with Phoenix as your agents grow.

Arize AX & Dify

While Arize Phoenix is a fantastic tool for iterating quickly—tracing your agents, testing prompt and model changes, and running structured offline evaluations—there comes a point when you need continuous visibility as your LLM workflows scale in production.

Arize AX is the answer to scalability. It builds on Phoenix’s observability with live evaluations on production data, dashboards to watch your metrics over time, and monitors that flag unexpected changes as they happen. Arize helps you keep a pulse on your Dify workflows in the real world, making it easier to catch regressions, understand user impact, and confidently roll out updates as your usage grows.

Arize + Dify: Sample Use Case for Monitoring and Iterating on Agents

Connect your Dify app to Arize
In Dify’s monitoring tab, enter your Arize credentials, traces start flowing automatically.
Stream production data into Arize
As users interact with your Dify workflows, Arize captures structured traces in real time, logging model call details, tool usage, and any metadata you are interested in.
Set up online evaluations
Spin up online evaluators—like accuracy, safety, or user frustration checks—to automatically score your agent’s outputs on live traffic without manual spot-checking.
Monitor key metrics on dashboards
Use Arize dashboards to track evaluation scores, token usage, latency, and cost trends in one place, so you can visually see what’s changing.
Configure alerts and monitors
Set up monitors and alerts to catch drifts, regressions, or sudden spikes before they sneak into your end user’s experience.
Iterate with confidence
Use insights from your dashboards and alerts to guide prompt or model tweaks in Dify. As you ship updates, keep tracking their real-world impact with online and offline evaluations, ensuring your changes actually move the needle for your users.

Ready to Level Up Your Dify AI Agents?

Observability will keep you moving smart. Get started for free with either observability tool:

Arize Phoenix: If you are looking for something fast and focusing largely on development and iteration mode.
Arize AX: If your application is in production, and you need continuous monitoring on live traffic.

On this page

Product

Kakaku Accelerates AI Adoption with Dify: Fast, Secure, and Scalable

Kakaku.com adopted Dify Enterprise to turn scattered AI experiments into production-ready solutions in hours. Before long, 75% of employees had created nearly 950 internal apps, making AI a real force for company-wide innovation.

Jing Yan

Nov 21, 2025

Product

EdgeOne Pages Plugin: Rapid Deployment, Instant Validation

The EdgeOne Pages plugin on the Dify Marketplace enables instant deployment of ZIP or HTML files as public URLs, no setup required. Powered by Tencent Cloud EdgeOne’s CDN, it delivers secure, low-latency performance and automatic versioning, streamlining rapid prototyping and MVP testing so developers can focus on innovation, not infrastructure.

Tencent EdgeOne

Nov 5, 2025

Product

Dify x TiDB: Supercharge Your Knowledge Pipeline with Distributed Vector Storage

Dify has integrated TiDB Vector into knowledge pipeline to support distributed semantic search, accelerate knowledge retrieval speed, and provide scalable context management functions for production-level AI applications.

Zhenan Sun

Oct 30, 2025

Product

Dify x Qdrant: Building and Powering the Next-Gen AI Applications

Dify now integrates Qdrant, a high-performance Rust-based vector database, delivering faster retrieval, hybrid search, and scalable performance for building reliable, production-ready AI and knowledge-based applications.

Anne Zhu

Oct 29, 2025

Ready to Build the AI App of Tomorrow?

Launch production-ready agents powered by RAG pipelines, integrations, and full observability - no heavy lifting required.

Ready to Build the AI App of Tomorrow?

Launch production-ready agents powered by RAG pipelines, integrations, and full observability - no heavy lifting required.

Ready to Build the AI App of Tomorrow?

Launch production-ready agents powered by RAG pipelines, integrations, and full observability - no heavy lifting required.

Resources

Docs

Blog

Events

Support

Roadmap

Company

Talk to Us

Join Us

Dify Brand Guidelines

Programs

Education

Partner

Dify Affiliate Program

Terms & Policies

Cookie Settings

Dify Brand Usage Terms

Data Protection Agreement

Marketplace Agreement

End User License Agreement

Dify Affiliate Program Agreement

Unlock Agentic AI with Dify. Develop, deploy, and manage autonomous agents, RAG pipelines, and more for teams at any scale, effortlessly.

Build Production-Ready Agentic Workflow

Resources

Docs

Blog

Events

Support

Roadmap

Company

Talk to Us

Join Us

Dify Brand Guidelines

Programs

Education

Partner

Dify Affiliate Program

Terms & Policies

Cookie Settings

Dify Brand Usage Terms

Data Protection Agreement

Marketplace Agreement

End User License Agreement

Dify Affiliate Program Agreement

Unlock Agentic AI with Dify. Develop, deploy, and manage autonomous agents, RAG pipelines, and more for teams at any scale, effortlessly.

Build Production-Ready Agentic Workflow