Automatically Identifying Shared Root Causes of Test Breakages in SAP HANA

Gabin An, Juyeon Yoon, Jeongju Sohn, Jingun Hong, Dongwon Hwang, Shin Yoo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Continuous Integration (CI) of a largescale software system such as SAP HANA can produce a non-trivial number of test breakages. Each breakage that newly occurs from daily runs needs to be manually inspected, triaged, and eventually assigned to developers for debugging. However, not all new breakages are unique, as some test breakages would share the same root cause; in addition, human errors can produce duplicate bug tickets for the same root cause. An automated identification of breakages with shared root causes will be able to significantly reduce the cost of the (typically manual) post-breakage steps. This paper investigates multiple similarity functions between test breakages to assist and automate the identification of test breakages that are caused by the same root cause. We consider multiple information sources, such as static (i.e., the code itself), historical (i.e., whether the test results have changed in a similar way in the past), as well as dynamic (i.e., whether the coverage of test cases are similar to each other), for the purpose of such automation. We evaluate a total of 27 individual similarity functions, using realworld CI data of SAP HANA from a six-month period. Further, using these individual similarity functions as in-put features, we construct a classification model that can predict whether two test breakages share the same root cause or not. When trained using ground truth labels extracted from the issue tracker of SAP HANA, our model achieves an F1 score of 0.743 when evaluated using a set of unseen test breakages collected over three months. Our results show that a classification model based on test similarity functions can successfully support the bug triage stage of a CI pipeline.

Original languageEnglish
Title of host publicationProceedings - 2022 ACM/IEEE 44th International Conference on Software Engineering
Subtitle of host publicationSoftware Engineering in Practice, ICSE-SEIP 2022
PublisherIEEE Computer Society
Pages65-74
Number of pages10
ISBN (Electronic)9781665495905
DOIs
StatePublished - 2022
Event44th ACM/IEEE International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2022 - Pittsburgh, United States
Duration: 22 May 202227 May 2022

Publication series

NameProceedings - International Conference on Software Engineering
ISSN (Print)0270-5257

Conference

Conference44th ACM/IEEE International Conference on Software Engineering: Software Engineering in Practice, ICSE-SEIP 2022
Country/TerritoryUnited States
CityPittsburgh
Period22/05/2227/05/22

Keywords

  • Continuous Integration
  • Root Cause Analysis
  • Test Similarity

Fingerprint

Dive into the research topics of 'Automatically Identifying Shared Root Causes of Test Breakages in SAP HANA'. Together they form a unique fingerprint.

Cite this