Introduction
Less than a month after the DeepSeek V3 model sparked heated discussions in the industry, DeepSeek has once again launched a new model, R1, setting off another wave in the global artificial intelligence field. If V3 demonstrated that top-tier model performance could be achieved with low-cost training thanks to its impressive cost-effectiveness, R1 represents a qualitative leap in terms of technology. This open-source model not only inherits the characteristic of high cost-effectiveness but also attracts the attention of leading AI researchers worldwide with its unique training methods and emergent reasoning abilities.
In many tests, DeepSeek R1 has demonstrated remarkable reasoning capabilities. DeepSeek R1-Zero's accuracy in the AIME math competition climbed from an initial 15.6% to 71.0%, with multiple attempts reaching even 86.7%. In another test, the model also exhibited strong transfer learning ability, reaching a performance level above 96.3% of human participants on the programming contest platform Codeforces. These results clearly demonstrate that R1-Zero isn't simply memorizing problem-solving patterns, but has genuinely mastered deep mathematical intuition and universal reasoning capabilities.
![](https://framerusercontent.com/images/9aal8nSK2z3C00moS0REYpsqw.png)
While DeepSeek R1 is powerful, it currently has some pain points, such as its lack of multi-modal capabilities. The DeepSeek website version offers file upload and network connectivity, but these two functions cannot be enabled simultaneously.
To address the aforementioned issues, we leveraged Dify, an open-source LLM Ops tool, for low-code development. When developing LLM products with Dify, you only need to focus on product design without worrying about code implementation. By simply dragging and adding nodes, you can quickly transform ideas into runnable products and deploy them.
We will not directly use DeepSeek R1 as the output model, but rather use its output as a pre-processing inference tool to enhance the multimodal capabilities of a more powerful model that lacks inference capabilities. Furthermore, we will utilize Dify's beta Plugin feature to package the built LLM application as an OpenAI-formatted API, allowing integration with other tools.
Dify: Low-Code Integration and Development of DeepSeek Applications
On Dify, you can quickly build a bidirectional collaborative system based on DeepSeek R1 and multi-modal models through visual workflow design.
![](https://framerusercontent.com/images/yItFVpwFknDLNPFJhKz6k8nM6Go.png)
First, you need to log in to Dify and select "Create Blank Application" -> "Chatflow".
File Upload and Doc Extractor
Dify v0.10.0 has added a file upload function, which needs to work with a doc extractor to parse files into text that LLMs can read.
You can enable and set file types in "Features" -> "File Upload".
![](https://framerusercontent.com/images/r14rorHSm0CMjVYFlXcGfAOlH0.png)
![](https://framerusercontent.com/images/qF9yZrqWUTtvM3VcyLrxL5EpzE.png)
DeepSeek R1 Node (LLM Node): "The Top Student's" In-Depth Reasoning
First, you need to obtain and add your DeepSeek API Key in "Settings" -> "Model Providers".
If you are using the community or enterprise version, please ensure that Dify is the latest version.
![](https://framerusercontent.com/images/mCz50NS7i7yiL5tjorGVvVsajBg.png)
DeepSeek R1 plays the role of the "top student", focusing on problem breakdown and logical reasoning. Its core task is to output the complete thought process rather than directly providing answers.
When writing system prompts, it is recommended to write structured prompts, such as using XML format, which can enhance the model's decomposition of the problem task.
Do not output the final answer, only output the thinking process.
Do not explain your own capabilities or limitations.
Note that the two input variables are enclosed in XML format, which will help the LLM understand. You can refer to the previous node's variables by typing {
or /
.
![](https://framerusercontent.com/images/OYc260x1QPA7ayTkZ3He6ghRZGc.png)
Gemini Node (LLM Node): Multi-Modal Implementation
Gemini is a multimodal model with strong visual capabilities, relying on the R1 reasoning framework to combine multimodal data and generate a final answer. Its advantage lies in image parsing and result optimization.
The system prompt is as follows:
![](https://framerusercontent.com/images/4nY3STNXcaWnpmD6kN6J9OL21E.png)
In addition, you need to enable the LLM's visual capabilities in this node to gain vision capabilities.
Try it Now
You can now immediately pull this demo from the Explore page to your application list:
English:Deploy to Dify
Chinese:Deploy to Dify