Agent-Ark/Toucan-1.5M
General NLPEnglishapache-2.0
Created by Agent-Ark at 2025, the Agent-Ark/Toucan-1.5M is a General NLP dataset in English containing 1,646,546 records in Parquet format. With 5.2K downloads and 218 likes, it is actively used by the community. It is released under the apache-2.0 license and is a 1M<n<10M-scale dataset.
About Agent-Ark/Toucan-1.5M
🦤 Toucan-1.5M:
Toucan-1.5M is the largest fully synthetic tool-agent dataset to date, designed to advance tool use in agentic LLMs. It comprises over 1.5 million trajectories synthesized from 495 real-world Model Context Protocols (MCPs) spanni...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- 1646546
- Size
- 1M<n<10M
- Creator
- Agent-Ark
- Year
- 2025
- License
- apache-2.0
- Downloads
- 5194
- Likes
- 218