ReactiveAI/Beta-Pre-Train-Corpus
General NLPEnglish
ReactiveAI/Beta-Pre-Train-Corpus is a General NLP-focused dataset in English distributed in Parquet format.
About ReactiveAI/Beta-Pre-Train-Corpus
Reactive AI / Beta Pre-Train Corpus
Pre-training corpus for RxT-Beta models, created from public & open datasets. Includes high-quality english and polish web crawl data, mathematic and scientific subsets,
and code in different programming lang...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- ReactiveAI
- Year
- 2025