HuggingFaceTB/cosmopedia-100k
General NLPENapache-2.0
HuggingFaceTB/cosmopedia-100k is a General NLP dataset in EN from HuggingFaceTB with 100,000 records in Parquet format. It is distributed under the apache-2.0 license and falls in the 100K<n<1M size category, and has been downloaded 677 times.
About HuggingFaceTB/cosmopedia-100k
Dataset description
This is a 100k subset of Cosmopedia dataset. A synthetic dataset of textbooks, blogposts, stories, posts and WikiHow articles generated by Mixtral-8x7B-Instruct-v0.1.
Here's how you can load the dataset
from datasets import ...
Details
- Task
- General NLP
- Language
- EN
- Format
- Parquet
- Rows / instances
- 100000
- Size
- 100K<n<1M
- Creator
- HuggingFaceTB
- Year
- 2024
- License
- apache-2.0
- Downloads
- 677
- Likes
- 49