TY - GEN
T1 - Leveraging Fault Localisation to Enhance Defect Prediction
AU - Sohn, Jeongju
AU - Kamei, Yasutaka
AU - McIntosh, Shane
AU - Yoo, Shin
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/3
Y1 - 2021/3
N2 - Software Quality Assurance (SQA) is a resource constrained activity. Research has explored various means of sup-porting that activity. For example, to aid in resource investment decisions, defect prediction identifies modules or changes that are likely to be defective in the future. To support repair activities, fault localisation identifies areas of code that are likely to require change to address known defects. Although the identification and localisation of defects are interdependent tasks, the synergy between defect prediction and fault localisation remains largely underexplored.We hypothesise that modifying code that was suspicious in the past is riskier than modifying code that was not. To validate our hypothesis, in this paper, we employ fault localisation, which localises the root cause of a program failure. We compute the past suspiciousness score of code changes to each fault, and use those scores to (1) define new features for training defect prediction models; and (2) guide the next actions of developers for a commit labelled as fix-inducing. An empirical study of three open-source projects confirms our hypothesis. The new suspiciousness features improve F1 score and balanced accuracy of Just-In-Time (JIT) defect prediction models by 4.2% to 92.2% and by 1.2% to 3.7%, respectively. When guiding developer actions, past code suspiciousness successfully guides developers to a defective file, inspecting two to nine fewer files on average, compared to the baselines based on previous findings on past faults. These results demonstrate the potential of synergies of fault localisation and defect prediction, and lay the groundwork for explorations of that combined space.
AB - Software Quality Assurance (SQA) is a resource constrained activity. Research has explored various means of sup-porting that activity. For example, to aid in resource investment decisions, defect prediction identifies modules or changes that are likely to be defective in the future. To support repair activities, fault localisation identifies areas of code that are likely to require change to address known defects. Although the identification and localisation of defects are interdependent tasks, the synergy between defect prediction and fault localisation remains largely underexplored.We hypothesise that modifying code that was suspicious in the past is riskier than modifying code that was not. To validate our hypothesis, in this paper, we employ fault localisation, which localises the root cause of a program failure. We compute the past suspiciousness score of code changes to each fault, and use those scores to (1) define new features for training defect prediction models; and (2) guide the next actions of developers for a commit labelled as fix-inducing. An empirical study of three open-source projects confirms our hypothesis. The new suspiciousness features improve F1 score and balanced accuracy of Just-In-Time (JIT) defect prediction models by 4.2% to 92.2% and by 1.2% to 3.7%, respectively. When guiding developer actions, past code suspiciousness successfully guides developers to a defective file, inspecting two to nine fewer files on average, compared to the baselines based on previous findings on past faults. These results demonstrate the potential of synergies of fault localisation and defect prediction, and lay the groundwork for explorations of that combined space.
KW - defect prediction
KW - fault localisation
KW - search-based software engineering
KW - software quality assurance
UR - http://www.scopus.com/inward/record.url?scp=85106632725&partnerID=8YFLogxK
U2 - 10.1109/SANER50967.2021.00034
DO - 10.1109/SANER50967.2021.00034
M3 - Conference contribution
AN - SCOPUS:85106632725
T3 - Proceedings - 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021
SP - 284
EP - 294
BT - Proceedings - 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021
Y2 - 9 March 2021 through 12 March 2021
ER -