TY - GEN
T1 - Bench4BL
T2 - 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018
AU - Lee, Jaekwon
AU - Kim, Dongsun
AU - Bissyandé, Tegawendé F.
AU - Jung, Woosung
AU - Le Traon, Yves
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/7/12
Y1 - 2018/7/12
N2 - In recent years, the use of Information Retrieval (IR) techniques to automate the localization of buggy files, given a bug report, has shown promising results. The abundance of approaches in the literature, however, contrasts with the reality of IR-based bug localization (IRBL) adoption by developers (or even by the research community to complement other research approaches). Presumably, this situation is due to the lack of comprehensive evaluations for state-of-the-art approaches which offer insights into the actual performance of the techniques. We report on a comprehensive reproduction study of six stateof-the-art IRBL techniques. This study applies not only subjects used in existing studies (old subjects) but also 46 new subjects (61,431 Java files and 9,459 bug reports) to the IRBL techniques. In addition, the study compares two different version matching (between bug reports and source code files) strategies to highlight our observations related to performance deterioration.We also vary test file inclusion to investigate the effectiveness of IRBL techniques on test files, or its noise impact on performance. Finally, we assess potential performance gain if duplicate bug reports are leveraged.
AB - In recent years, the use of Information Retrieval (IR) techniques to automate the localization of buggy files, given a bug report, has shown promising results. The abundance of approaches in the literature, however, contrasts with the reality of IR-based bug localization (IRBL) adoption by developers (or even by the research community to complement other research approaches). Presumably, this situation is due to the lack of comprehensive evaluations for state-of-the-art approaches which offer insights into the actual performance of the techniques. We report on a comprehensive reproduction study of six stateof-the-art IRBL techniques. This study applies not only subjects used in existing studies (old subjects) but also 46 new subjects (61,431 Java files and 9,459 bug reports) to the IRBL techniques. In addition, the study compares two different version matching (between bug reports and source code files) strategies to highlight our observations related to performance deterioration.We also vary test file inclusion to investigate the effectiveness of IRBL techniques on test files, or its noise impact on performance. Finally, we assess potential performance gain if duplicate bug reports are leveraged.
KW - Bug localization
KW - Information retrieval
KW - Reproducibility studies
UR - http://www.scopus.com/inward/record.url?scp=85051503714&partnerID=8YFLogxK
U2 - 10.1145/3213846.3213856
DO - 10.1145/3213846.3213856
M3 - Conference contribution
AN - SCOPUS:85051503714
T3 - ISSTA 2018 - Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis
SP - 61
EP - 72
BT - ISSTA 2018 - Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis
A2 - Bodden, Eric
A2 - Tip, Frank
PB - Association for Computing Machinery, Inc
Y2 - 16 July 2018 through 21 July 2018
ER -