Skip to content

Agent-Ark/Toucan-1.5M

General NLPEnglishapache-2.0

Created by Agent-Ark at 2025, the Agent-Ark/Toucan-1.5M is a General NLP dataset in English containing 1,646,546 records in Parquet format. With 5.2K downloads and 218 likes, it is actively used by the community. It is released under the apache-2.0 license and is a 1M<n<10M-scale dataset.

About Agent-Ark/Toucan-1.5M

🦤 Toucan-1.5M: Toucan-1.5M is the largest fully synthetic tool-agent dataset to date, designed to advance tool use in agentic LLMs. It comprises over 1.5 million trajectories synthesized from 495 real-world Model Context Protocols (MCPs) spanni...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
1646546
Size
1M<n<10M
Creator
Agent-Ark
Year
2025
License
apache-2.0
Downloads
5194
Likes
218
Download Homepage

Related General NLP datasets

FAQ