Open-Orca/SlimOrca-Dedup
Text ClassificationQuestion AnsweringText GenerationEnglish
Open-Orca/SlimOrca-Dedup is a text classification-focused dataset in English distributed in Parquet format.
About Open-Orca/SlimOrca-Dedup
Overview
"SlimOrca Dedup" is a deduplicated, unfiltered subset of the SlimOrca dataset, excluding RLHF instances, resulting in 363k unique examples.
Key Features
Removal of RLHF instances.
Deduplication using minhash and Jaccard si...
Details
- Task
- Text Classification, Question Answering, Text Generation
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- Open-Orca
- Year
- 2023