Introduction
Dify is an open source, model agnostic platform for agentic AI. It unifies visual workflows, a full RAG Knowledge Pipeline, and LLMOps so teams ship production ready agents fast, in self hosted or cloud environments.
But speed alone isn’t enough—you need to know your applications are sound. As your AI applications and agents get more complex, keeping them accurate and efficient becomes a real challenge.
By leveraging observability features, you can begin to answer questions like:
Are your agents taking the most efficient paths?
Is your chosen model the right one in terms of token usage, latency, and cost?
How well are your retrieval steps contributing to output quality?
Observability isn’t just a production concern; it’s also essential during development, helping you catch silent errors, monitor costs, and understand LLM and agent behavior before issues reach your users. And once your application is live, that same visibility ensures it stays reliable as you scale.
This is where products like Arize Phoenix and Arize AX come in, giving you one-click observability, performance insights, experimentation tools, and evaluation pipelines that let you bring Dify applications into production with confidence.
Observability shouldn’t slow you down. It should be as seamless as building your drag-and-drop workflow.

Arize Phoenix & Dify
When you’re building with Dify, you get the flexibility to spin up LLM-powered workflows in no time. But as your agents get more complex, keeping them accurate and efficient becomes a real challenge. Tracing weird behaviors, debugging failures, and actually improving quality (instead of just hoping it’s better) starts to matter. That’s where Phoenix steps in.
Arize-Phoenix is your open-source observability layer for LLM apps, plugging right into your Dify workflows so you can actually see what your agents are doing. Every model call, tool invocation, and chain step your agents execute gets traced automatically, so you’re not left guessing why a prompt tweak worked—or made things worse. Inputs, outputs, latencies, and metadata all show up, making it easy to debug and optimize without hunting through logs.
Phoenix goes beyond just tracing. It lets you annotate your collected traces, build structured test datasets, create tailored evaluations, and run tests to measure exactly how your agents are performing before you ship changes. That way, you can keep moving fast with Dify while staying confident that your workflows aren’t silently breaking.

Phoenix + Dify: Sample Use Case for Improving Your Agents
Configure your Dify application with Phoenix
In Dify’s monitoring tab, drop in your Phoenix credentials, and tracing is good to go.Collect traces
Run your Dify agent as usual, and Phoenix will automatically capture structured traces for every conversation and task.Build a dataset for evaluation
Hop into Phoenix, grab traces that capture key user flows, tricky edge cases, and examples where your agent struggles. Save these examples as a dataset so you can use it as a reference point to evaluate performance changes over time.Iterate and Experiment
Use Phoenix’s LLM span replay and prompt playground to test prompt tweaks and model changes against your dataset. Compare outputs side by side to see how your changes affect results on real examples.Define and Run Evaluators
Set up and run evaluators—such as correctness, helpfulness, or relevance checks—on your experiment results. Your original dataset serves as a reference point, helping you see where changes improve outputs on previously failing cases and flag regressions on examples that were working before.
Deploy with confidence
Update your Dify application with tested changes. Keep tracing, evaluating, and refining with Phoenix as your agents grow.
Arize AX & Dify
While Arize Phoenix is a fantastic tool for iterating quickly—tracing your agents, testing prompt and model changes, and running structured offline evaluations—there comes a point when you need continuous visibility as your LLM workflows scale in production.
Arize AX is the answer to scalability. It builds on Phoenix’s observability with live evaluations on production data, dashboards to watch your metrics over time, and monitors that flag unexpected changes as they happen. Arize helps you keep a pulse on your Dify workflows in the real world, making it easier to catch regressions, understand user impact, and confidently roll out updates as your usage grows.

Arize + Dify: Sample Use Case for Monitoring and Iterating on Agents
Connect your Dify app to Arize
In Dify’s monitoring tab, enter your Arize credentials, traces start flowing automatically.Stream production data into Arize
As users interact with your Dify workflows, Arize captures structured traces in real time, logging model call details, tool usage, and any metadata you are interested in.Set up online evaluations
Spin up online evaluators—like accuracy, safety, or user frustration checks—to automatically score your agent’s outputs on live traffic without manual spot-checking.Monitor key metrics on dashboards
Use Arize dashboards to track evaluation scores, token usage, latency, and cost trends in one place, so you can visually see what’s changing.Configure alerts and monitors
Set up monitors and alerts to catch drifts, regressions, or sudden spikes before they sneak into your end user’s experience.Iterate with confidence
Use insights from your dashboards and alerts to guide prompt or model tweaks in Dify. As you ship updates, keep tracking their real-world impact with online and offline evaluations, ensuring your changes actually move the needle for your users.
Ready to Level Up Your Dify AI Agents?
Observability will keep you moving smart. Get started for free with either observability tool:
Arize Phoenix: If you are looking for something fast and focusing largely on development and iteration mode.
Arize AX: If your application is in production, and you need continuous monitoring on live traffic.
