nvidia/ChatQA2-Long-SFT-data
General NLPEN
Nvidia/ChatQA2-Long-SFT-data is a General NLP-focused dataset in EN distributed in Parquet format.
About nvidia/ChatQA2-Long-SFT-data
Data Description
Here, we release the full long SFT training dataset of ChatQA2. It consists of two parts: long_sft and NarrativeQA_131072. The long_sft dataset is built and derived from existing datasets: LongAlpaca12k, GPT-4 samples from Open...
Details
- Task
- General NLP
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- nvidia
- Year
- 2024