TY - GEN
T1 - You cannot fix what you cannot find! an investigation of fault localization bias in benchmarking automated program repair systems
AU - Liu, Kui
AU - Koyuncu, Anil
AU - Bissyande, Tegawende F.
AU - Kim, Dongsun
AU - Klein, Jacques
AU - Le Traon, Yves
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/4
Y1 - 2019/4
N2 - Properly benchmarking Automated Program Repair (APR) systems should contribute to the development and adoption of the research outputs by practitioners. To that end, the research community must ensure that it reaches significant milestones by reliably comparing state-of-the-art tools for a better understanding of their strengths and weaknesses. In this work, we identify and investigate a practical bias caused by the fault localization (FL) step in a repair pipeline. We propose to highlight the different fault localization configurations used in the literature, and their impact on APR systems when applied to the Defects4J benchmark. Then, we explore the performance variations that can be achieved by 'tweaking' the FL step. Eventually, we expect to create a new momentum for (1) full disclosure of APR experimental procedures with respect to FL, (2) realistic expectations of repairing bugs in Defects4J, as well as (3) reliable performance comparison among the state-of-theart APR systems, and against the baseline performance results of our thoroughly assessed kPAR repair tool. Our main findings include: (a) only a subset of Defects4J bugs can be currently localized by commonly-used FL techniques; (b) current practice of comparing state-of-the-art APR systems (i.e., counting the number of fixed bugs) is potentially misleading due to the bias of FL configurations; and (c) APR authors do not properly qualify their performance achievement with respect to the different tuning parameters implemented in APR systems.
AB - Properly benchmarking Automated Program Repair (APR) systems should contribute to the development and adoption of the research outputs by practitioners. To that end, the research community must ensure that it reaches significant milestones by reliably comparing state-of-the-art tools for a better understanding of their strengths and weaknesses. In this work, we identify and investigate a practical bias caused by the fault localization (FL) step in a repair pipeline. We propose to highlight the different fault localization configurations used in the literature, and their impact on APR systems when applied to the Defects4J benchmark. Then, we explore the performance variations that can be achieved by 'tweaking' the FL step. Eventually, we expect to create a new momentum for (1) full disclosure of APR experimental procedures with respect to FL, (2) realistic expectations of repairing bugs in Defects4J, as well as (3) reliable performance comparison among the state-of-theart APR systems, and against the baseline performance results of our thoroughly assessed kPAR repair tool. Our main findings include: (a) only a subset of Defects4J bugs can be currently localized by commonly-used FL techniques; (b) current practice of comparing state-of-the-art APR systems (i.e., counting the number of fixed bugs) is potentially misleading due to the bias of FL configurations; and (c) APR authors do not properly qualify their performance achievement with respect to the different tuning parameters implemented in APR systems.
KW - Automated Program Repair
KW - Benchmarking
KW - Bias
KW - Empirical Assessment
KW - Spectrum-based Fault Localization
UR - http://www.scopus.com/inward/record.url?scp=85064184355&partnerID=8YFLogxK
U2 - 10.1109/ICST.2019.00020
DO - 10.1109/ICST.2019.00020
M3 - Conference contribution
AN - SCOPUS:85064184355
T3 - Proceedings - 2019 IEEE 12th International Conference on Software Testing, Verification and Validation, ICST 2019
SP - 102
EP - 113
BT - Proceedings - 2019 IEEE 12th International Conference on Software Testing, Verification and Validation, ICST 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th IEEE International Conference on Software Testing, Verification and Validation, ICST 2019
Y2 - 22 April 2019 through 27 April 2019
ER -