Release

Dify.AI's New Dataset Feature Enhancements

The recent updates to Dify.AI's dataset management tools introduce a "Citations and Attributions" feature for easier documentation referencing, and a new Dataset API for efficient data management. Support for multiple file formats and document segmentation enhances data handling. Additionally, integration with GPT-3.5-turbo-instruct and various Hugging Face embedding models provides users with more model options for specific applications, improving overall user experience.

Dify

Dify.AI

Written on

Oct 12, 2023

Share

Share to Twitter
Share to LinkedIn
Share to Hacker News

Release

·

Oct 12, 2023

Dify.AI's New Dataset Feature Enhancements

The recent updates to Dify.AI's dataset management tools introduce a "Citations and Attributions" feature for easier documentation referencing, and a new Dataset API for efficient data management. Support for multiple file formats and document segmentation enhances data handling. Additionally, integration with GPT-3.5-turbo-instruct and various Hugging Face embedding models provides users with more model options for specific applications, improving overall user experience.

Dify

Dify.AI

Share to Twitter
Share to LinkedIn
Share to Hacker News

Release

Dify.AI's New Dataset Feature Enhancements

The recent updates to Dify.AI's dataset management tools introduce a "Citations and Attributions" feature for easier documentation referencing, and a new Dataset API for efficient data management. Support for multiple file formats and document segmentation enhances data handling. Additionally, integration with GPT-3.5-turbo-instruct and various Hugging Face embedding models provides users with more model options for specific applications, improving overall user experience.

Dify

Dify.AI

Written on

Oct 12, 2023

Share

Share to Twitter
Share to LinkedIn
Share to Hacker News

Release

·

Oct 12, 2023

Dify.AI's New Dataset Feature Enhancements

Share to Twitter
Share to LinkedIn
Share to Hacker News

Release

·

Oct 12, 2023

Dify.AI's New Dataset Feature Enhancements

Share to Twitter
Share to LinkedIn
Share to Hacker News

In the rapidly changing realm of data management, staying updated with the latest features and functionalities is crucial for maintaining a competitive edge. With this in mind, we are excited to roll out a series of updates aimed at enhancing the usability and efficiency of our dataset management tools. These updates introduce refined features and a user-friendly interface, ensuring an effortless and efficient interaction with datasets. Below is a comprehensive overview of the new features, and how they can contribute to a streamlined data management experience.

Referencing Dataset Documentation

Now, upon manually enabling the "Citations and Attributions" feature within the application orchestration, the output will display the referenced documentation sources, such as the names of the documents cited, and one can directly navigate to the respective dataset documentation editing page. This not only facilitates efficient document location, but also makes the modification of subsequent document segments much easier.

New Dataset API Features

The Dataset API service is a tool for efficiently managing and utilizing data documentation. With the Dify Dataset API feature, you can easily upload, real-time update, and effectively manage datasets. It’s tightly integrated with large models, further enhancing the user experience and significantly improving efficiency. Additionally, we provide examples to help everyone quickly understand and get hands-on practice.

How to Use Dataset API Feature?

Navigate to the Dataset page, where you can switch to the API page from the navigation on the left. On this page, you can view the Dataset API documentation provided by Dify and manage the credentials for accessing the Dataset API in the API key section.

Examples of Dataset API Calls

Create Empty Dataset

This method allows for the creation of an empty dataset.

curl --location --request POST 'https://api.dify.ai/v1/datasets' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{"name": "name"}'

Dataset List

Query dataset list by specifying the page number and the number of returns, aiding in dataset management and selection.

curl --location --request GET 'https://api.dify.ai/v1/datasets?page=1&limit=20' \
--header 'Authorization: Bearer {api_key}'

Create Document via Text

Easily import existing text data through a simple text upload interface.

curl --location --request POST '<https://api.dify.ai/v1/datasets/<uuid:dataset_id>/document/create_by_text>' \\
--header 'Authorization: Bearer {api_key}' \\
--header 'Content-Type: application/json' \\
--data-raw '{
    "name": "Dify",
    "text": "Dify means Do it for you...",
    "indexing_technique": "high_quality",
    "process_rule": {
        "rules": {
                "pre_processing_rules": [{
                        "id": "remove_extra_spaces",
                        "enabled": true
                }, {
                        "id": "remove_urls_emails",
                        "enabled": true
                }],
                "segmentation": {
                        "separator": "###",
                        "max_tokens": 500
                }
        },
        "mode": "custom"
    }

}'

Create Document via File

File upload feature further supports various file formats such as markdown, md, pdf, html, htm, xlsx, docx, csv, significantly expanding the choices.

curl --location POST 'https://api.dify.ai/v1/datasets/{dataset_id}/document/create_by_file' \
--header 'Authorization: Bearer {api_key}' \
--form 'data="{
        "name": "Dify",
        "indexing_technique": "high_quality",
        "process_rule": {
                "rules": {
                        "pre_processing_rules": [{
                                "id": "remove_extra_spaces",
                                "enabled": true
                        }, {
                                "id": "remove_urls_emails",
                                "enabled": true
                        }],
                        "segmentation": {
                                "separator": "###",
                                "max_tokens": 500
                        }
                },
                "mode": "custom"
        }
    }";
    type=text/plain' \
--form 'file=@"/path/to/file"

Get Document Embedding Status (Progress)

View real-time data processing status, ensuring the timeliness and accuracy of data preparation.

curl --location --request GET 'https://api.dify.ai/v1/datasets/{dataset_id}/documents/{batch}/indexing-status' \
--header 'Authorization: Bearer {api_key}'

Delete Document

Provides convenient document management features, enabling the deletion of unwanted documents as needed to maintain dataset cleanliness and effectiveness.

curl --location --request DELETE 'https://api.dify.ai/v1/datasets/{dataset_id}/documents/{document_id}' \
--header 'Authorization: Bearer {api_key}'

Dataset Document List

Provides a convenient and quick query interface, easily grasp the basic information of all documents in the dataset.

curl --location --request GET 'https://api.dify.ai/v1/datasets/{dataset_id}/documents' \
--header 'Authorization: Bearer {api_key}'

Add Document Segmentation:

The segmentation feature offers a flexible way to adjust the document structure, aiding in better organization and understanding of document content, thereby improving data usability and value.

curl 'https://api.dify.ai/v1/datasets/aac47674-31a8-4f12-aab2-9603964c4789/documents/2034e0c1-1b75-4532-849e-24e72666595b/segment' \
  --header 'Authorization: Bearer {api_key}' \
  --header 'Content-Type: application/json' \
  --data-raw $'"segments":[
  {"content":"Dify means Do it for you",
  "keywords":["Dify","Do"]
  }
  ]'
  --compressed

Other Features

Moreover, in this update, we have optimized some other minor features, bringing a smoother user experience for everyone, let’s take a look!

  • More Available Embedding Models:

The cloud version of Dify now supports the integration of open-source Embedding models hosted on Hugging Face, offering a broader selection. By switching and testing different model performances, you can find the Embedding model that best fits and performs in specific application scenarios.

  • Integration of GPT-3.5-turbo-instruct:

Dify has integrated the newly launched GPT-3.5-turbo-instruct model by OpenAI, which is a significant leap in improving user interaction with the model. It has been trained to address issues present in older models, capable of deeply understanding and executing user commands to provide clearer and more on-point answers, thus having a broader range of applications.


via @dify_ai

If you like Dify, give it a Star ⭐️.

On this page

    The Innovation Engine for Generative AI Applications

    The Innovation Engine for Generative AI Applications

    The Innovation Engine for Generative AI Applications