Skip to content

CodeXGLUE: BigCloneBench Dataset

Clone DetectionCoding Lang: Java

Created by Svajlenko & Wang et al. at 2014, the CodeXGLUE: BigCloneBench Dataset is a clone detection dataset in Coding Lang: Java containing 1,731,860 records in JSON, Text format.

About CodeXGLUE: BigCloneBench Dataset

Given two codes as the input, the task is to do binary classification (0/1), where 1 stands for semantic equivalence and 0 for others. Models are evaluated by F1 score.

Details

Task
Clone Detection
Language
Coding Lang: Java
Format
JSON, Text
Rows / instances
1,731,860
Creator
Svajlenko & Wang et al.
Year
2014
Download Paper

Related Clone Detection datasets

FAQ