Skip to content

princeton-nlp/SWE-bench

General NLPEnglishBenchmark

Princeton-nlp/SWE-bench is a General NLP-focused benchmark dataset in English distributed in Parquet format.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

About princeton-nlp/SWE-bench

Dataset Summary SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification ...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Creator
princeton-nlp
Year
2023
Download Paper

Related General NLP datasets

FAQ