arcinstitute/opengenome2
Text GenerationEnglishBenchmarkapache-2.0
Created by arcinstitute at 2025, the arcinstitute/opengenome2 is a text generation benchmark dataset in English in Parquet format. With 6.5K downloads and 147 likes, it is actively used by the community. It is released under the apache-2.0 license and is a n>1T-scale dataset.
📊 This dataset is used as an LLM benchmark. See model leaderboards →
About arcinstitute/opengenome2
OpenGenome2
OpenGenome2 is a database of nearly 9 trillion base pairs of curated DNA from across all domains of life. Collected from diverse species and public data sources, OpenGenome2 was used to train Evo 2 models. Please refer to the Ev...
Details
- Task
- Text Generation
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Size
- n>1T
- Creator
- arcinstitute
- Year
- 2025
- License
- apache-2.0
- Downloads
- 6538
- Likes
- 147