Insights

Marketplace

Pricing

Docs

Blog

Get Started

Introduction to DifySandbox

This blog thoroughly outlines the rationale, design principles, and implementation mechanisms that guided the development of DifySandbox.

Yeuoly

Backend Engineer

Written on

Jul 10, 2024

Share to Twitter

Share to LinkedIn

Share to Hacker News

Product

Jul 10, 2024

Introduction to DifySandbox

This blog thoroughly outlines the rationale, design principles, and implementation mechanisms that guided the development of DifySandbox.

Yeuoly

Backend Engineer

Share to Twitter

Share to LinkedIn

Share to Hacker News

Product

Introduction to DifySandbox

This blog thoroughly outlines the rationale, design principles, and implementation mechanisms that guided the development of DifySandbox.

Yeuoly

Backend Engineer

Written on

Jul 10, 2024

Share to Twitter

Share to LinkedIn

Share to Hacker News

Product

Jul 10, 2024

Introduction to DifySandbox

Share to Twitter

Share to LinkedIn

Share to Hacker News

Product

Jul 10, 2024

Introduction to DifySandbox

Share to Twitter

Share to LinkedIn

Share to Hacker News

Hi, it’s Yeuoly! I’m the backend engineer at Dify.AI and the creator of DifySandbox.

For our community users, the DifySandbox service should be familiar. It's a docker service running in the background of Dify. We've received a lot of feedback about it, but most users are unfamiliar with the DifySandbox itself and unclear about its internal details. This blog will gradually help you understand what's happening inside DifySandbox.

DifySandbox for Code Execution and Security

In Dify, Workflow is a crucial feature that allows users to orchestrate a logic flow through drag-and-drop, enabling relatively complex business logic. During the process of orchestrating logic, data processing is essential. Specifically, we may encounter the following scenarios:

We often find ourselves needing to process JSON text created by LLMs to extract structured data from it. Additionally, we have to handle XML or JSON text received from HTTP requests, parsing it into structured data and extracting information from it.
In some cases, we may need to combine the output content from two knowledge retrieval nodes or merge the results from both the google search node and the knowledge retrieval node.
In more situations, individuals with basic programming skills may want to utilize template syntaxes such as Jinja2 and Liquid to achieve more adaptable prompt orchestration.

Even though these scenarios are diverse, they all involve data processing and require a unified solution. Naturally, the first thing that comes to mind is writing code to implement this, as it's more versatile compared to highly customized data processing nodes. No one wants a specific node to be responsible for parsing JSON text. So, why not provide users with a code editing box where they can write their own code to implement data processing logic? We could even wrap the LLM node with a template formatted based so that we can achieve a more flexible way to orchestrate prompts. All of the above issues can be easily resolved in this manner.

As we need to execute user-written code in Dify, we must address security issues. When dealing with malicious users, code execution becomes a vulnerability instead of a normal feature.

Typically, the majority of requests made to Dify are not intended to cause harm. Whether it involves running code or accessing workspace information, all requests must be routed through Dify. When it comes to code execution, this involves initiating a new Python or Node.js process on the server and sending the user's code to this process for execution.

When Process falls into the hands of malicious users, it becomes problematic. Since Process runs directly on the server, it has access to both the file system and database. Code written by malicious users could potentially read any file on the server or even gain access to and manipulate the entire Dify database.

To address this, we have developed Difysandbox, a code sandbox solution that effectively blocks malicious code while allowing normal business operations to continue. In the following sections, we will provide a detailed overview of DifySandbox.

Design and Implementation of DifySandbox

Before we began the design process, we carefully considered several security requirements for the sandbox.

We considered that developers around the world have different coding preferences. In the realm of LLMs, Python and Node.js are clearly the top two choices. However, we don't want to limit users to only using Python or Node.js, so we aim to provide both options. As a result, the technical solution for the Sandbox should not be tied to a specific language but should offer a comprehensive system-level solution.
Is it possible to bypass the Sandbox? It is certainly impossible to prevent this entirely, as no system is absolutely secure. Therefore, we cannot solely rely on the sandbox itself for security. Instead, we should ensure that even if vulnerabilities exist in the Sandbox, hackers cannot access core resources.
The network is a major concern with far-reaching consequences.
- Many sandbox bypasses in production environments occur due to network vulnerabilities, such as in VMWare, where numerous vulnerabilities are exploited through the network.
- Even with code-level and system-level protections in place, it's still challenging to prevent malicious code from sniffing internal networks and illegally accessing internal resources. Therefore, the sandbox must also consider how to isolate the network.

At the same time, we also examined existing sandbox solutions globally. Here's an overview of their pros and cons:

WebAssembly: This a popular sandbox solution that compiles Python or Node.js interpreters into a WebAssembly runtime. This enables Python code to run in Node.js or web browsers. When used correctly, WebAssembly provides strong security at the system level. However, it has limitations. For example, installing third-party dependencies can cause problems such as architecture incompatibility. Furthermore, Python and Node.js require different handling, which makes WebAssembly less versatile.
Docker: Some providers generate a new Docker container for each code execution request. While this approach offers great flexibility, it results in extremely slow execution speeds, taking 1+ seconds per run. In a workflow with 10 code nodes, this could accumulate to 10+ seconds, which is a significant performance drawback. Furthermore, this method entails managing containers and mounting the docker daemon's sock to Dify's api containers, leading to significant security risks. Alternatively, using docker-in-docker is even slower, making neither option ideal.
Language-specific sandbox packages: These include vm2 for Node.js and PyPy for Python. However, these packages are limited to specific languages and come with their own limitations. They are not universal solutions and require adherence to language-specific standards when managing dependencies. For example, PyPy has strict Python version limitations, and managing dependencies is not easy. While there are libraries like Pyodide, the main challenge is that Node.js and Python solutions are not interchangeable, which makes maintenance difficult.
Kernel extensions: They are utilized by some well-established sandbox solutions to limit process behavior. However, they often come with complex configuration documentation and startup processes. Examples of such solutions include Sandboxie and judge0. Although these are kernel-level solutions, Judge0 has experienced a severe CVE in the past due to configuration issues. These solutions require privileged containers for kernel extensions, which means that if their restrictions are bypassed, Docker's limitations also become ineffective.

It appears that the existing solutions do not entirely meet our business requirements. They have shortcomings in various areas, such as slow execution speed, incompatible language features, or possible security risks. Therefore, we have made the decision to create our own solution with the following key features:

Multi-layer isolation: We implement multi-layer isolation using Docker containers but with a unique approach. Instead of creating a new Docker container for each task, we only launch a single sandbox container during the startup phase. This container runs an internal http service that receives code execution requests from our system. It's important to note that this approach restricts the system from running on Linux, while Windows and Mac platforms require tools like Docker Desktop or Orbstack.
System-level isolation: On Linux, Docker is a common system sandbox solution. However, since we've already implemented one layer of Docker, we need to leverage Docker's underlying technology: Seccomp (Secure Computing Mode).
Seccomp acts as a filter for all attempts to access the system. It intercepts and controls various operations, including, but not limited to, file read/write operations, system configuration modifications, network access, and even standard input/output. This works because these operations are essentially system calls (syscalls), and each syscallrepresents an attempt to access the system. The flow of these operations is listed below.
By implementing Seccomp, we can capture any attempts made by a process to execute unauthorized syscalls. This typically includes capturing file access, process creation, disk mounting, and system modifications.

However, each chip architecture has its own unique syscall system with a large number of syscalls. For instance, the syscall number for writing a file is 2 on amd64 but 64 on arm64. AMD64 has over 300 syscalls, while arm64 has more than 400. Using a blacklist strategy could unintentionally allow certain syscalls, posing significant security risks. Therefore, DifySandbox uses a whitelist strategy, permitting only necessary permissions while intercepting all non-essential ones.

At the file system level, we need to create a virtualized file system for the child processes of the sandbox. This separation isolates the file system of the sandbox host container from the file system of the processes running user code within the sandbox. The main reason for this is that Seccomp can only allow or deny access to all files. To have more precise control, we need certain files to be accessible as usual, such as Python's third-party dependencies, while others like sensitive files such as /etc/passwd, should be inaccessible. Taking these factors into account, we need to isolate a separate file system, which is where Linux's chroot (change root) comes in. It allows us to change a process's root directory to a temporary directory. For example, after executing chroot("/tmp") in a Python process, when that Python process runs ls /, it will only see the files that were originally under /tmp. This effectively isolates the file system. As shown in the diagram below, each sandbox folder ultimately becomes the root directory for a specific sandbox process.
However, chroot has some bypass vulnerabilities. If chroot is applied again to a subdirectory, it becomes possible to use "cd .." to access directories outside the chroot environment that should be inaccessible. Since chroot requires root privileges, we need to change the process privileges before entering user-written code logic, transferring the current user/group of the process to a non-root user/group. There are also some system calls like openat that can potentially bypass chroot restrictions, but these issues can be addressed through detailed protective measures, which we won't elaborate on here.
It's not a good idea to handle everything directly within the Docker sandbox from a networking perspective. Firstly, Docker's limitations on iptables make it difficult to configure system-level network policies without a Kubernetes environment. Secondly, network configuration is a complex task that requires a flexible approach, as there is no one-size-fits-all strategy. Moreover, network isolation policies differ significantly between k8s and docker-compose. Therefore, our plan is to develop separate solutions for k8s and docker-compose.
- In docker-compose, we create an internal network for the sandbox and then introduce a proxy container to connect to the external network. This proxy container serves as the intermediary for the sandbox. By configuring proxy rules on the proxy container, we can achieve flexible configuration using proxy services like Squid.
  The network structure is as follows: The Sandbox is located within the SSRF_PROXY_NET, which is an internal network that cannot access external resources. The proxy container is also within this network and the DEFAULT network. This setup allows the proxy to act as an intermediary for accessing external networks and enables the configuration of highly flexible network proxy rules to ensure intranet security.
  It's worth noting that Dify itself also utilizes a proxy container, which can be reused there. This is because certain features, such as HTTP nodes, may pose security risks. Therefore, all such connections are uniformly managed through the proxy's rules configuration.
- When using k8s, everything becomes easier. Configuring egress rules is all you need to do.

Conclusion

DifySandbox is a sandbox runtime designed to utilize Linux's native capabilities without being dependent on specific programming languages. Its main goal is to provide Dify with a secure environment for executing code, ensuring that user code can run safely while offering enhanced functionality. To ensure security, we have implemented comprehensive isolation strategies across various aspects, including system, disk, file system, network, and permissions. We have also employed multi-layered isolation to avoid the use of privileged operations.

However, there are still some limitations to address. For example, managing dependencies for Python and Node.js remains challenging. Despite implementing measures for Python dependencies, certain issues persist. Additionally, our whitelist approach may inadvertently block legitimate behaviors in Python and Node.js. We are committed to further optimizing the sandbox to enhance the user experience.

We're excited to announce that DifySandbox is now open-source! This move aligns with our commitment to open-source principles and aims to expand the possibilities for DifySandbox. We welcome everyone to join us in shaping the future of the sandbox, including potential improvements like image processing, flexible data analysis, and creating images and videos.

On this page

Product

Dify Integrates Palo Alto Networks Plugin for Enhanced AI Application Security

The addition of the PANW AI Security plugin enriches the Dify Marketplace ecosystem and provides Dify users with a crucial layer of enterprise-grade security.

Leilei

Apr 30, 2025

Product

Dify Integrates Palo Alto Networks Plugin for Enhanced AI Application Security

The addition of the PANW AI Security plugin enriches the Dify Marketplace ecosystem and provides Dify users with a crucial layer of enterprise-grade security.

Leilei

Apr 30, 2025

Product

Dify MCP Plugin Hands-On Guide: Integrating Zapier for Effortless Agent Tool Calls

Integrate Zapier's thousands of apps into Dify AI agents using the Model Context Protocol (MCP).

Leilei

Apr 1, 2025

Product

Dify MCP Plugin Hands-On Guide: Integrating Zapier for Effortless Agent Tool Calls

Integrate Zapier's thousands of apps into Dify AI agents using the Model Context Protocol (MCP).

Leilei

Apr 1, 2025

Product

DupDub Plugins Land on Dify Marketplace with Advanced Audio AI Capabilities

The DupDub AI audio plug-in is now available in the Dify Marketplace, providing voice translation, voice cloning, speaker recognition, and text-to-speech capabilities to help users build more engaging AI applications.

Dify

Dupdub

Mar 27, 2025

Product

DupDub Plugins Land on Dify Marketplace with Advanced Audio AI Capabilities

Dify

Dupdub

Mar 27, 2025

Product

Enhance Dify RAG with InfraNodus: Expand Your LLM’s Context

Integrating InfraNodus with Dify RAG enhances AI responses by providing contextual insights, improving retrieval accuracy, and enabling better handling of broad queries through topic mapping and metadata enrichment.