Skip to content

OpenDataArena/ODA-Mixture-500k

General NLPEnglish

OpenDataArena/ODA-Mixture-500k is a General NLP-focused dataset in English distributed in Parquet format.

About OpenDataArena/ODA-Mixture-500k

ODA-Mixture-500k ODA-Mixture-500k is a large-scale general-purpose post-training dataset curated from top-performing open corpora (selected via the OpenDataArena leaderboard) and refined through deduplication, benchmark decontamination. ...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Creator
OpenDataArena
Year
2025
Download

Related General NLP datasets

FAQ