Skip to content

ReactiveAI/Beta-Pre-Train-Corpus

General NLPEnglish

ReactiveAI/Beta-Pre-Train-Corpus is a General NLP-focused dataset in English distributed in Parquet format.

About ReactiveAI/Beta-Pre-Train-Corpus

Reactive AI / Beta Pre-Train Corpus Pre-training corpus for RxT-Beta models, created from public & open datasets. Includes high-quality english and polish web crawl data, mathematic and scientific subsets, and code in different programming lang...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Creator
ReactiveAI
Year
2025
Download

Related General NLP datasets

FAQ