Skip to content

arcinstitute/Stack-CellxGene45M

General NLPEnglishBenchmarkcc-by-4.0

Arcinstitute/Stack-CellxGene45M is a General NLP benchmark dataset in English from arcinstitute in Parquet format. It is distributed under the cc-by-4.0 license, and has been downloaded 15.8K times.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

About arcinstitute/Stack-CellxGene45M

CellxGene 45M Collection A curated subset of CellxGene (~45M cells) used to align the Stack model after pretraining on full human scBaseCount. Selection Criteria ≥ 50,000 cells per dataset ≥ 5 donors per dataset Cell Type A...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Creator
arcinstitute
Year
2026
License
cc-by-4.0
Downloads
15814
Likes
6
Download Homepage

Related General NLP datasets

FAQ