Skip to content

HuggingFaceTB/cosmopedia-100k

General NLPENapache-2.0

HuggingFaceTB/cosmopedia-100k is a General NLP dataset in EN from HuggingFaceTB with 100,000 records in Parquet format. It is distributed under the apache-2.0 license and falls in the 100K<n<1M size category, and has been downloaded 677 times.

About HuggingFaceTB/cosmopedia-100k

Dataset description This is a 100k subset of Cosmopedia dataset. A synthetic dataset of textbooks, blogposts, stories, posts and WikiHow articles generated by Mixtral-8x7B-Instruct-v0.1. Here's how you can load the dataset from datasets import ...

Details

Task
General NLP
Language
EN
Format
Parquet
Rows / instances
100000
Size
100K<n<1M
Creator
HuggingFaceTB
Year
2024
License
apache-2.0
Downloads
677
Likes
49
Download Homepage

Related General NLP datasets

FAQ