Release
Introducing Workflow File Upload: Google NotebookLM Podcast Demo
Aug 1, 2023
·
Oct 21, 2024
Today, we're excited to launch Dify v0.10.0, now with file upload capability. Dify helps developers quickly bring AI ideas to life, whether for product prototypes or productivity tools. With this update, Workflow supports various document formats, audio, and video files, pushing the boundaries of AI application development even further.
Why This Matters
The new file upload feature enables:
Document Q&A: Answer questions based on uploaded documents with reliable source referencing.
Report Summaries: Extract key insights from long documents to create concise summaries.
Spreadsheet Processing: Retrieve and manipulate specific content within documents or spreadsheets efficiently.
More importantly, this enhancement opens the door to multimodal AI applications. Developers can now create workflows that process images, audio, and video, greatly improving functionality and user experience.
Getting Started with File Uploads
Enable File Upload Directly
Activating the file upload feature is easy. Toggle it on in the feature list (with file references stored in the system variable
sys.files
). Users can upload files through the chat interface, and the most recent file will replace the previous one automatically. Developers can enable the memory feature for more flexible context management.Create Custom Variables
Alternatively, create custom variables in the Start node to support single or multiple file uploads. Once set up, the interface will display a file upload form, with dialogues and workflows centered around the uploaded files.
After uploading, files need preprocessing based on their type for effective analysis by the LLM:
Document Files (e.g., TXT, PDF, HTML): Use the Doc Extractor node to extract text into a string variable for LLM usage.
Audio/Video Files: Require tools like audio-to-text conversion or video keyframe extraction.
Note: OpenAI’s 'gpt-4.0-audio-preview' model supports direct audio processing for reasoning and conversation. This feature will be integrated in future updates.
This release also introduces Doc Extractor and List Operation nodes for file extraction and filtering, along with enhancements to most Workflow nodes. For more details, see our documentation.
Building AI Podcasts with File Uploads
Google recently introduced NotebookLM, an AI tool that stands out for its new audio feature. It quickly analyzes large volumes of content, extracts key points, and generates conversational voice summaries, much like podcasts, which save users time and help them grasp the main ideas.
Now, let's demonstrate how to use Workflow's file upload feature and related nodes to turn documents into AI-driven podcasts, achieving functionality similar to NotebookLM.
Configuring the Start Node
Create a new Chatflow, and in the Start Node, configure file upload and define key variables like tone, host name, guest name, and language:
file
: Select "Single File" as the field type to enable document uploads.tone
: Use "Dropdown Options" with choices like Casual, Formal, and Humorous to let users customize the podcast style.host_name
: Choose "Text" for entering the host’s name.guest_name
: Choose "Text" for entering the guest’s name.language
: Use "Dropdown Options" with choices like 中文, English, 日本語 to allow users to select the podcast language.
Generating Podcast Scripts with Doc Extractor and LLM Nodes
Once the file is uploaded, the Doc Extractor node pulls text from the file
variable, converting unstructured data into text the LLM can process. This content then flows through three LLM nodes to create a complete podcast script:
LLM Analysis Node (Analyze the Input)
This node analyzes the extracted text to distill key information needed for the podcast, including important themes, story points, and data, laying the groundwork for content creation.
LLM Script Generation Node (Craft the Dialogue)
Based on the analyzed content and preset variables (
tone
,language
,host_name
,guest_name
), this node generates a natural, engaging, and personalized podcast dialogue, ensuring interactions align with the defined roles and styles.LLM Summary Node (Conclusion)
This node produces a podcast summary by recapping key points through the host and guest dialogue, offering thoughtful insights.
Merging Content with the Template Node
After processing with the LLM nodes, we have the podcast dialogue and summary. The Template node then combines these elements into a cohesive script.
Input: Retrieve text segments from the Craft the Dialogue and Conclusion nodes, referenced via variables
arg1
andarg2
.Output: Merge
arg1
(dialogue content) andarg2
(summary) to generate a coherent podcast script, outputting it as a string ready for the next processing step.
Configuring and Generating the Podcast Audio
Finally, the script is passed to the Podcast Audio Generator via the Template output
, initiating the audio generation stage.
This tool converts the script into podcast audio. Developers can choose voices for the host and guest (e.g., "Alloy" and "Shimmer") to define the vocal style. The generator then produces the podcast as an audio file, ready for download.
By following these steps, you can easily create AI-generated podcasts with the file upload feature. We've also made this application available as a template on the Explore page, allowing you to get started quickly.
Image Upload Deprecation Notice
In Dify v0.10.0, we've upgraded the image upload capability into a file upload feature, allowing applications to manage documents, audio, and video files alongside images.
For Chatflow Applications
The image upload functionality is now part of the file upload feature. Once enabled, you can reference images and other files in the chat window by selecting sys.file
through the visual variable selector in the LLM node.
We've ensured backward compatibility, so applications using the previous image upload feature will continue to work seamlessly.
For Workflow Applications
We recommend creating custom file-type variables in the Start Node to manage a broader range of files.
Note: The old image upload feature and sys.file
system variable will be phased out in future versions.