Skip to content

code-search-net/code_search_net

Text GenerationFill MaskCODEBenchmark

Created by code-search-net at 2022, the code-search-net/code_search_net is a text generation benchmark dataset in CODE containing 4,141,072 records in Parquet format. With 22.2K downloads and 331 likes, it is actively used by the community. It is released under the other license and is a 1M<n<10M-scale dataset.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

About code-search-net/code_search_net

Dataset Card for CodeSearchNet corpus Dataset Summary CodeSearchNet corpus is a dataset of 2 milllion (comment, code) pairs from opensource libraries hosted on GitHub. It contains code and documentation for several programming langua...

Details

Task
Text Generation, Fill Mask
Language
CODE
Format
Parquet
Rows / instances
4141072
Size
1M<n<10M
Creator
code-search-net
Year
2022
License
other
Downloads
22168
Likes
331
Download Homepage

Related Text Generation, Fill Mask datasets

FAQ