
We're excited to announce that a suite of powerful audio AI plugins from DupDub is now available on the Dify Marketplace. This integration brings advanced capabilities like speech transcription, voice cloning, speaker identification, and text-to-speech synthesis directly into the Dify platform, empowering our users to build more versatile, accessible, and engaging AI applications.
Dify Marketplace: Fueling AI Innovation
Launched alongside Dify v1.0.0, our Marketplace is rapidly growing into a vibrant ecosystem built upon our open-source AI application development platform. Hosting a diverse range of plugins like Models, Tools, Agent Strategies, Extensions, and Bundles, the Marketplace empowers Dify users to innovate and scale AI solutions faster. Its modular design makes it the ideal environment for integrating cutting-edge tools like those from DupDub, further enhancing the multi-modal capabilities available within the Dify community.

Introducing DupDub Tools: Advanced Audio Processing in Dify
With DupDub's tools now integrated into Dify, users can seamlessly incorporate sophisticated audio processing into their AI workflows. Key capabilities available include:
Speech Transcription: Accurately convert audio and video speech into text using the
TranscribeSpeech
tool. This facilitates content analysis, subtitle generation, data processing, and improves accessibility.Voice Cloning: Create unique, personalized voice experiences. The
Voice Cloning
tool allows you to clone a specific speaker's voice from a sample, perfect for generating consistent brand voices, personalized assistants, or localized content narration.Speaker Identification: Utilize the
Get Speaker ID
tool to identify and differentiate between multiple speakers within an audio file. This is invaluable for analyzing meeting transcripts, customer service calls, or any multi-participant audio scenario.Text-to-Speech (TTS) Synthesis: Generate natural-sounding speech from text with the
Speech Synthesis
tool. Customize the output by selecting cloned or standard voices, adjusting speed, and modifying pitch to create high-quality audio for various applications.
These features can be easily orchestrated within Dify's visual workflow builder, allowing for automated and efficient processing of audio-related tasks with minimal manual effort.

Getting Started with DupDub Tools in Dify
Below is a quick walkthrough of how to use DupDub’s tools in Dify.
TranscribeSpeech Tool
Easily integrate speech-to-text transcription into your workflow with this tool.
Add the Tool to the Workflow:
Go to Orchestrate in the left panel.
Click on Tools and search for TranscribeSpeech under the dupdup plugin.
Drag and drop the TranscribeSpeech tool into the workflow.
Configure the TranscribeSpeech Tool:
Connect the START node to the TranscribeSpeech node.
Fill in the Input Variables:
Duration (Required): Enter the duration of the video/audio.
URL (Required): Provide the link to the video/audio file.
Language (Required): Set the language of the content (e.g.,
en
for English).
Additional Settings:
Retry on Failure: Enable if you want the tool to retry on errors.
Error Handling: Choose how errors should be managed.
Next Step: Define what happens after transcription (e.g., further processing).
Run and Publish:
Click Run to execute the workflow.
After successful testing, click Publish to finalize and deploy the workflow.

Voice Cloning Tool
Add powerful voice cloning capabilities to your workflow for precise and customizable audio replication.
Add the Tool to the Workflow:
Go to Orchestrate in the left panel.
Click on Tools and search for Voice Cloning under the Dupdub plugin.
Drag and drop the Voice Cloning tool into the workflow.

Configure the Voice Cloning Tool:
Connect the START node to the Voice Cloning node.
Fill in the Input Variables:
Speaker Name (Required): Enter the name of the speaker whose voice will be cloned.
URL (Required): Provide the link to the speech sample (supports WAV, MP3, MP4 formats).
Language (Required): Specify the language of the speech sample (e.g.,
en
for English,zh
for Chinese).Gender (Required): Specify the speaker’s gender (
MALE
orFEMALE
).Age (Required): Indicate the speaker’s age group (
Children
,Youth
,Adults
,Seniors
).
Additional Settings:
Retry on Failure: Enable if you want the tool to automatically retry in case of errors.
Error Handling: Select how the system should handle potential errors.
Next Step: Define the subsequent steps after the voice cloning process (e.g., synthesis or playback).
Run and Publish:
Click Run to execute the workflow and initiate the voice cloning process.
After successful testing, click Publish to finalize and deploy the workflow.

Get Speaker ID Tool
Use this tool to identify speakers within your workflow, enabling advanced audio processing and personalization.
Add the Tool to the Workflow:
Go to Orchestrate in the left panel.
Click on Tools and search for Get Speaker ID under the Dupdub plugin.
Drag and drop the Get Speaker ID tool into the workflow.

Configure the Get Speaker ID Tool:
Connect the START node to the Get Speaker ID node.
No input variables are required for this tool in the current configuration.
Additional Settings:
Retry on Failure: Enable this option if you want the tool to automatically retry in case of errors.
Error Handling: Choose how to handle errors if they occur during the process.
Next Step: Select the next block to continue the workflow after retrieving the speaker ID.
Run and Publish:
Click Run to execute the workflow and retrieve the speaker ID.
After successful testing, click Publish to finalize and deploy the workflow.

This process facilitates the identification of speakers in your workflow, allowing for more advanced audio processing and personalized outputs.
Speech Synthesis Tool
Seamlessly integrate text-to-speech synthesis into your workflow for customized, high-quality audio generation.
Add the Tool to the Workflow:
Go to Orchestrate in the left panel.
Click on Tools and search for Speech Synthesis under the Dupdub plugin.
Drag and drop the Speech Synthesis tool into the workflow.

Configure the Speech Synthesis Tool:
Connect the START node to the Speech Synthesis node.
Fill in the Input Variables:
Speaker Name (Required): Enter the name of the speaker for the synthesized voice.
Speaker (Required): Provide the identifier for the speaker whose voice will be used.
Speed (Optional): Set the speed of speech. The default is normal speed (1.0).
Pitch (Optional): Set the pitch of the speech. The default is standard pitch (0).
Text (Required): Enter the text that will be converted into speech.
Additional Settings:
Retry on Failure: Enable if you want the tool to automatically retry in case of errors.
Error Handling: Choose how errors should be managed.
Next Step: Define the subsequent actions after speech synthesis (e.g., saving the file or playback).
Run and Publish:
Click Run to execute the workflow and synthesize speech from the provided text.
After successful testing, click Publish to finalize and deploy the workflow.

Example Applications: What You Can Build
The addition of DupDub tools unlocks exciting possibilities for Dify users. Imagine building:
Multilingual Content Creation Assistants: Use
TranscribeSpeech
to generate subtitles for your videos. Then, employVoice Cloning
andSpeech Synthesis
within a Dify workflow to create natural-sounding voiceovers in multiple languages, dramatically expanding your content's reach.Enhanced E-Learning Platforms: Automatically transcribe lecture audio for searchable notes using
TranscribeSpeech
. Synthesize course materials into spoken lessons with customizable voices viaSpeech Synthesis
, and even clone instructor voices for personalized student feedback loops.Automated Corporate Training & Analysis: Convert static training manuals into engaging audio formats using
Speech Synthesis
. LeverageTranscribeSpeech
for accessibility across languages and useGet Speaker ID
to analyze recordings of team discussions or customer interactions managed through your Dify-built internal tools.
About DupDub
DupDub is a cutting-edge AI-powered platform that revolutionizes the way content is created, localized, and shared. Specializing in Text-to-speech (TTS), Voice cloning, and video dubbing, DupDub empowers creators, educators, marketers, and businesses to generate natural, high-quality speech and multilingual content—effortlessly and at scale.
With a user-friendly interface and access to over 700+ realistic AI voices in 90+ languages and accents, DupDub enables anyone to produce professional-grade voiceovers in minutes. Whether you're making YouTube videos, e-learning courses, podcasts, or promotional materials, DupDub helps you save time, cut production costs, and engage global audiences with authentic, localized voice experiences.
Website | Discord | Instagram | YouTube | Twitter
About Dify.AI
Dify.AI is revolutionizing AI-native application development by providing an open-source platform that simplifies the entire lifecycle of AI application creation, deployment, and management. With its extensible plugin ecosystem, Dify.AI enables developers and businesses to seamlessly integrate AI capabilities, customize workflows, and accelerate innovation. By lowering the barriers to AI adoption, Dify.AI empowers users to build intelligent applications with greater efficiency and flexibility.