Skip to content

TigerResearch/pretrain_zh

General NLPEnglishBenchmark

The TigerResearch/pretrain_zh dataset is a English General NLP resource from TigerResearch at 2023.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

About TigerResearch/pretrain_zh

Dataset Card for "pretrain_zh" Tigerbot pretrain数据的中文部分。 包含(未压缩前) 中文书籍zh-books 12G, 中文互联网zh-webtext 25G, 中文百科zh-wiki 19G 更多语料请关注开源模型及持续更新 https://github.com/TigerResearch/TigerBot Usage import datasets ds_sft = datasets.load_dat...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Creator
TigerResearch
Year
2023
Download

Related General NLP datasets

FAQ