Insights

Marketplace

Solutions

Pricing

Docs

Blog

Get Started

Adding MultiModal Capabilities to Deepseek R1 using Dify

On Dify, you can quickly build a bidirectional collaborative system based on DeepSeek R1 and multi-modal models through visual workflow design.

Steven

Technical Writer

Written on

Feb 8, 2025

Share to Twitter

Share to LinkedIn

Share to Hacker News

How to

Feb 8, 2025

Adding MultiModal Capabilities to Deepseek R1 using Dify

On Dify, you can quickly build a bidirectional collaborative system based on DeepSeek R1 and multi-modal models through visual workflow design.

Steven

Technical Writer

Share to Twitter

Share to LinkedIn

Share to Hacker News

How to

Adding MultiModal Capabilities to Deepseek R1 using Dify

On Dify, you can quickly build a bidirectional collaborative system based on DeepSeek R1 and multi-modal models through visual workflow design.

Steven

Technical Writer

Written on

Feb 8, 2025

Share to Twitter

Share to LinkedIn

Share to Hacker News

How to

Feb 8, 2025

Adding MultiModal Capabilities to Deepseek R1 using Dify

Share to Twitter

Share to LinkedIn

Share to Hacker News

How to

Feb 8, 2025

Adding MultiModal Capabilities to Deepseek R1 using Dify

Share to Twitter

Share to LinkedIn

Share to Hacker News

Introduction

Less than a month after the DeepSeek V3 model sparked heated discussions in the industry, DeepSeek has once again launched a new model, R1, setting off another wave in the global artificial intelligence field. If V3 demonstrated that top-tier model performance could be achieved with low-cost training thanks to its impressive cost-effectiveness, R1 represents a qualitative leap in terms of technology. This open-source model not only inherits the characteristic of high cost-effectiveness but also attracts the attention of leading AI researchers worldwide with its unique training methods and emergent reasoning abilities.

In many tests, DeepSeek R1 has demonstrated remarkable reasoning capabilities. DeepSeek R1-Zero's accuracy in the AIME math competition climbed from an initial 15.6% to 71.0%, with multiple attempts reaching even 86.7%. In another test, the model also exhibited strong transfer learning ability, reaching a performance level above 96.3% of human participants on the programming contest platform Codeforces. These results clearly demonstrate that R1-Zero isn't simply memorizing problem-solving patterns, but has genuinely mastered deep mathematical intuition and universal reasoning capabilities.

While DeepSeek R1 is powerful, it currently has some pain points, such as its lack of multi-modal capabilities. The DeepSeek website version offers file upload and network connectivity, but these two functions cannot be enabled simultaneously.

To address the aforementioned issues, we leveraged Dify, an open-source LLM Ops tool, for low-code development. When developing LLM products with Dify, you only need to focus on product design without worrying about code implementation. By simply dragging and adding nodes, you can quickly transform ideas into runnable products and deploy them.

We will not directly use DeepSeek R1 as the output model, but rather use its output as a pre-processing inference tool to enhance the multimodal capabilities of a more powerful model that lacks inference capabilities. Furthermore, we will utilize Dify's beta Plugin feature to package the built LLM application as an OpenAI-formatted API, allowing integration with other tools.

Dify: Low-Code Integration and Development of DeepSeek Applications

On Dify, you can quickly build a bidirectional collaborative system based on DeepSeek R1 and multi-modal models through visual workflow design.

First, you need to log in to Dify and select "Create Blank Application" -> "Chatflow".

File Upload and Doc Extractor

Dify v0.10.0 has added a file upload function, which needs to work with a doc extractor to parse files into text that LLMs can read.

You can enable and set file types in "Features" -> "File Upload".

DeepSeek R1 Node (LLM Node): "The Top Student's" In-Depth Reasoning

First, you need to obtain and add your DeepSeek API Key in "Settings" -> "Model Providers".

If you are using the community or enterprise version, please ensure that Dify is the latest version.

DeepSeek R1 plays the role of the "top student", focusing on problem breakdown and logical reasoning. Its core task is to output the complete thought process rather than directly providing answers.

When writing system prompts, it is recommended to write structured prompts, such as using XML format, which can enhance the model's decomposition of the problem task.

<Role>
You are an LLM with reasoning capabilities.
Unlike other LLMs, you can output your complete thinking process.
</Role>
<Task>
Your task is to assist other LLMs that lack reasoning capabilities.
You need to output complete thinking processes for other LLMs based on user questions.
<Steps>
"Step 1": "Receive questions from users."
"Step 2": "Conduct deep reasoning and analysis on user questions."
"Step 3": "Elaborate on the reasoning process and logic, ensuring the process is complete and easy to understand."
"Step 4": "Output the complete reasoning process, no final answer needed."
</Steps>
</Task>
<Limitations

Do not output the final answer, only output the thinking process.

Do not explain your own capabilities or limitations.

</Limitations>
In addition, we need to adjust the user input content, adding the content from the doc extractor:
<User Query>
{{Start}}
</User Query>
<file>
{{text}}
</file>

Note that the two input variables are enclosed in XML format, which will help the LLM understand. You can refer to the previous node's variables by typing { or /.

Gemini Node (LLM Node): Multi-Modal Implementation

Gemini is a multimodal model with strong visual capabilities, relying on the R1 reasoning framework to combine multimodal data and generate a final answer. Its advantage lies in image parsing and result optimization.

The system prompt is as follows:

<Role>
You are an LLM that excels at learning.
</Role>
<Task>
You need to learn from others' thinking processes about problems, enhance your results with their thinking, and then provide your answer.
<Steps>
"Step 1": "Receive thinking process from DeepSeek-R1 model."
"Step 2": "Carefully study and understand DeepSeek-R1's reasoning logic and steps."
"Step 3": "Generate final answer based on DeepSeek-R1's thinking, combined with image capabilities."
"Step 4": "Output the final answer, no need to explain the thinking process."
</Steps>
</Task>
<Limitations>
Do not repeat DeepSeek-R1's thinking process, only output the final answer.
Do not explain your own capabilities or learning process.
Ensure the answer is accurate and relevant to the question.
</Limitations>

In addition, you need to enable the LLM's visual capabilities in this node to gain vision capabilities.

Try it Now

You can now immediately pull this demo from the Explore page to your application list:

English：Deploy to Dify

Chinese：Deploy to Dify

On this page

How to

Build secure AI apps on Dify with Azure AI Content Safety Container Plugin

The Azure AI Content Safety Container Plugin brings seamless, real-time moderation to Dify apps, filtering harmful text and images with customizable controls. It offers robust, private deployment options and clear, actionable results to keep your platforms safe and compliant.

Jiqing You

Jul 1, 2025

How to

Build secure AI apps on Dify with Azure AI Content Safety Container Plugin

Jiqing You

Jul 1, 2025

How to

Deep Research Workflow in Dify: A Step-by-Step Guide

Learn how to build a Deep Research workflow with Dify using three key components: loop variables, structured outputs, and Agent nodes.

Evan Chen

Jing Yan

May 20, 2025

How to

Deep Research Workflow in Dify: A Step-by-Step Guide

Learn how to build a Deep Research workflow with Dify using three key components: loop variables, structured outputs, and Agent nodes.

Evan Chen

Jing Yan

May 20, 2025

How to

Level Up Your Dify Chatbot: Integrating InfraNodus for Advanced Q&A and Idea Generation

InfraNodus analyzes knowledge graphs to generate questions from structural gaps, enhancing Dify apps for insightful Q&A and idea generation.

Dmitry Paranyushkin

Leilei

Apr 17, 2025

How to

Level Up Your Dify Chatbot: Integrating InfraNodus for Advanced Q&A and Idea Generation

InfraNodus analyzes knowledge graphs to generate questions from structural gaps, enhancing Dify apps for insightful Q&A and idea generation.

Dmitry Paranyushkin

Leilei

Apr 17, 2025

How to

Turn Your Dify App into an MCP Server

With the mcp-server plugin, any Dify app can be turned into an MCP-compliant server endpoint, directly accessible by external MCP clients.

Leilei

Apr 14, 2025

How to

Turn Your Dify App into an MCP Server

With the mcp-server plugin, any Dify app can be turned into an MCP-compliant server endpoint, directly accessible by external MCP clients.

Leilei

Apr 14, 2025

Ready to Build the AI App of Tomorrow?

Launch production-ready agents powered by RAG pipelines, integrations, and full observability - no heavy lifting required.

Ready to Build the AI App of Tomorrow?

Launch production-ready agents powered by RAG pipelines, integrations, and full observability - no heavy lifting required.

Ready to Build the AI App of Tomorrow?

Launch production-ready agents powered by RAG pipelines, integrations, and full observability - no heavy lifting required.

Resources

Docs

Blog

Education

Partner

Support

Roadmap

Company

Talk to Us

Cookie Settings

Data Protection Agreement

Marketplace Agreement

End User License Agreement

Brand Guidelines

Unlock Agentic AI with Dify. Develop, deploy, and manage autonomous agents, RAG pipelines, and more for teams at any scale, effortlessly.

Build Production-Ready Agentic Workflow

Resources

Docs

Blog

Education

Partner

Support

Roadmap

Company

Talk to Us

Cookie Settings

Data Protection Agreement

Marketplace Agreement

End User License Agreement

Brand Guidelines

Unlock Agentic AI with Dify. Develop, deploy, and manage autonomous agents, RAG pipelines, and more for teams at any scale, effortlessly.

Build Production-Ready Agentic Workflow

Resources

Docs

Blog

Education

Partner

Support

Roadmap

Company

Talk to Us

Cookie Settings

Data Protection Agreement

Marketplace Agreement

End User License Agreement

Brand Guidelines

Unlock Agentic AI with Dify. Develop, deploy, and manage autonomous agents, RAG pipelines, and more for teams at any scale, effortlessly.

Build Production-Ready Agentic Workflow

Resources

Docs

Blog

Education

Partner

Support

Roadmap

Company

Talk to Us

Cookie Settings

Data Protection Agreement

Marketplace Agreement

End User License Agreement

Brand Guidelines

Unlock Agentic AI with Dify. Develop, deploy, and manage autonomous agents, RAG pipelines, and more for teams at any scale, effortlessly.

Build Production-Ready Agentic Workflow