HuggingFaceM4/the_cauldron
General NLPEnglish
HuggingFaceM4/the_cauldron is a General NLP-focused dataset in English that provides 1,880,992 labeled examples distributed in Parquet format. And falls in the 1M<n<10M size category, and has been downloaded 287.7K times.
About HuggingFaceM4/the_cauldron
Dataset Card for The Cauldron
Dataset description
The Cauldron is part of the Idefics2 release.
It is a massive collection of 50 vision-language datasets (training sets only) that were used for the fine-tuning of the vision-...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- 1880992
- Size
- 1M<n<10M
- Creator
- HuggingFaceM4
- Year
- 2026
- Downloads
- 287723
- Likes
- 547