Release

Introducing Dify Workflow File Upload: A Demo on AI Podcast

We’re launching the file upload feature today, with a demo on using it in an AI podcast application.

Joshua

Content Marketing

Evan Chen

Product Manager

Written on

Oct 21, 2024

Share

Share to Twitter
Share to LinkedIn
Share to Hacker News

Release

·

Oct 21, 2024

Introducing Dify Workflow File Upload: A Demo on AI Podcast

We’re launching the file upload feature today, with a demo on using it in an AI podcast application.

Joshua

Content Marketing

Evan Chen

Product Manager

Share to Twitter
Share to LinkedIn
Share to Hacker News

Release

Introducing Dify Workflow File Upload: A Demo on AI Podcast

We’re launching the file upload feature today, with a demo on using it in an AI podcast application.

Joshua

Content Marketing

Evan Chen

Product Manager

Written on

Oct 21, 2024

Share

Share to Twitter
Share to LinkedIn
Share to Hacker News

Release

·

Oct 21, 2024

Introducing Dify Workflow File Upload: A Demo on AI Podcast

Share to Twitter
Share to LinkedIn
Share to Hacker News

Release

·

Oct 21, 2024

Introducing Dify Workflow File Upload: A Demo on AI Podcast

Share to Twitter
Share to LinkedIn
Share to Hacker News

Today, we're excited to launch Dify v0.10.0, now with file upload capability. Dify helps developers quickly bring AI ideas to life, whether for product prototypes or productivity tools. With this update, Workflow supports various document formats, audio, and video files, pushing the boundaries of AI application development even further.

Why This Matters

The new file upload feature enables:

  • Document Q&A: Answer questions based on uploaded documents with reliable source referencing.

  • Report Summaries: Extract key insights from long documents to create concise summaries.

  • Spreadsheet Processing: Retrieve and manipulate specific content within documents or spreadsheets efficiently.

More importantly, this enhancement opens the door to multimodal AI applications. Developers can now create workflows that process images, audio, and video, greatly improving functionality and user experience.

Getting Started with File Uploads

  1. Enable File Upload Directly

    Activating the file upload feature is easy. Toggle it on in the feature list (with file references stored in the system variable sys.files). Users can upload files through the chat interface, and the most recent file will replace the previous one automatically. Developers can enable the memory feature for more flexible context management.

  2. Create Custom Variables

    Alternatively, create custom variables in the Start node to support single or multiple file uploads. Once set up, the interface will display a file upload form, with dialogues and workflows centered around the uploaded files.

After uploading, files need preprocessing based on their type for effective analysis by the LLM:

  • Document Files (e.g., TXT, PDF, HTML): Use the Doc Extractor node to extract text into a string variable for LLM usage.

  • Audio/Video Files: Require tools like audio-to-text conversion or video keyframe extraction.

Note: OpenAI’s 'gpt-4.0-audio-preview' model supports direct audio processing for reasoning and conversation. This feature will be integrated in future updates.

This release also introduces Doc Extractor and List Operation nodes for file extraction and filtering, along with enhancements to most Workflow nodes. For more details, see our documentation.

Building AI Podcasts with File Uploads

Google recently introduced NotebookLM, an AI tool that stands out for its new audio feature. It quickly analyzes large volumes of content, extracts key points, and generates conversational voice summaries, much like podcasts, which save users time and help them grasp the main ideas.

Now, let's demonstrate how to use Workflow's file upload feature and related nodes to turn documents into AI-driven podcasts, achieving functionality similar to NotebookLM.

Configuring the Start Node

Create a new Chatflow, and in the Start Node, configure file upload and define key variables like tone, host name, guest name, and language:

  • file: Select "Single File" as the field type to enable document uploads.

  • tone: Use "Dropdown Options" with choices like Casual, Formal, and Humorous to let users customize the podcast style.

  • host_name: Choose "Text" for entering the host’s name.

  • guest_name: Choose "Text" for entering the guest’s name.

  • language: Use "Dropdown Options" with choices like 中文, English, 日本語 to allow users to select the podcast language.

Generating Podcast Scripts with Doc Extractor and LLM Nodes

Once the file is uploaded, the Doc Extractor node pulls text from the file variable, converting unstructured data into text the LLM can process. This content then flows through three LLM nodes to create a complete podcast script:

  1. LLM Analysis Node (Analyze the Input)

    This node analyzes the extracted text to distill key information needed for the podcast, including important themes, story points, and data, laying the groundwork for content creation.

  2. LLM Script Generation Node (Craft the Dialogue)

    Based on the analyzed content and preset variables (tone, language, host_name, guest_name), this node generates a natural, engaging, and personalized podcast dialogue, ensuring interactions align with the defined roles and styles.

  3. LLM Summary Node (Conclusion)

    This node produces a podcast summary by recapping key points through the host and guest dialogue, offering thoughtful insights.

Merging Content with the Template Node

After processing with the LLM nodes, we have the podcast dialogue and summary. The Template node then combines these elements into a cohesive script.

  • Input: Retrieve text segments from the Craft the Dialogue and Conclusion nodes, referenced via variables arg1 and arg2.

  • Output: Merge arg1 (dialogue content) and arg2 (summary) to generate a coherent podcast script, outputting it as a string ready for the next processing step.

Configuring and Generating the Podcast Audio

Finally, the script is passed to the Podcast Audio Generator via the Template output, initiating the audio generation stage.

This tool converts the script into podcast audio. Developers can choose voices for the host and guest (e.g., "Alloy" and "Shimmer") to define the vocal style. The generator then produces the podcast as an audio file, ready for download.

By following these steps, you can easily create AI-generated podcasts with the file upload feature. We've also made this application available as a template on the Explore page, allowing you to get started quickly.

Image Upload Deprecation Notice

In Dify v0.10.0, we've upgraded the image upload capability into a file upload feature, allowing applications to manage documents, audio, and video files alongside images.

For Chatflow Applications

The image upload functionality is now part of the file upload feature. Once enabled, you can reference images and other files in the chat window by selecting sys.file through the visual variable selector in the LLM node.

We've ensured backward compatibility, so applications using the previous image upload feature will continue to work seamlessly.

For Workflow Applications

We recommend creating custom file-type variables in the Start Node to manage a broader range of files.

Note: The old image upload feature and sys.file system variable will be phased out in future versions.

On this page

    The Innovation Engine for Generative AI Applications

    The Innovation Engine for Generative AI Applications

    The Innovation Engine for Generative AI Applications