TY - JOUR
T1 - LADRA
T2 - Log-based abnormal task detection and root-cause analysis in big data processing with Spark
AU - Lu, Siyang
AU - Wei, Xiang
AU - Rao, Bingbing
AU - Tak, Byungchul
AU - Wang, Long
AU - Wang, Liqiang
N1 - Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2019/6
Y1 - 2019/6
N2 - As big data processing is being widely adopted by many domains, massive amount of generated data become more reliant on the parallel computing platforms for analysis, wherein Spark is one of the most widely used frameworks. Spark's abnormal tasks may cause significant performance degradation, and it is extremely challenging to detect and diagnose the root causes. To that end, we propose an innovative tool, named LADRA, for log-based abnormal tasks detection and root-cause analysis using Spark logs. In LADRA, a log parser first converts raw log files into structured data and extracts features. Then, a detection method is proposed to detect where and when abnormal tasks happen. In order to analyze root causes we further extract pre-defined factors based on these features. Finally, we leverage General Regression Neural Network (GRNN) to identify root causes for abnormal tasks. The likelihood of reported root causes are presented to users according to the weighted factors by GRNN. LADRA is an off-line tool that can accurately analyze abnormality without extra monitoring overhead. Four potential root causes, i.e., CPU, memory, network, and disk I/O, are considered. We have tested LADRA atop of three Spark benchmarks by injecting aforementioned root causes. Experimental results show that our proposed approach is more accurate in the root cause analysis than other existing methods.
AB - As big data processing is being widely adopted by many domains, massive amount of generated data become more reliant on the parallel computing platforms for analysis, wherein Spark is one of the most widely used frameworks. Spark's abnormal tasks may cause significant performance degradation, and it is extremely challenging to detect and diagnose the root causes. To that end, we propose an innovative tool, named LADRA, for log-based abnormal tasks detection and root-cause analysis using Spark logs. In LADRA, a log parser first converts raw log files into structured data and extracts features. Then, a detection method is proposed to detect where and when abnormal tasks happen. In order to analyze root causes we further extract pre-defined factors based on these features. Finally, we leverage General Regression Neural Network (GRNN) to identify root causes for abnormal tasks. The likelihood of reported root causes are presented to users according to the weighted factors by GRNN. LADRA is an off-line tool that can accurately analyze abnormality without extra monitoring overhead. Four potential root causes, i.e., CPU, memory, network, and disk I/O, are considered. We have tested LADRA atop of three Spark benchmarks by injecting aforementioned root causes. Experimental results show that our proposed approach is more accurate in the root cause analysis than other existing methods.
KW - Abnormal task
KW - Log analysis
KW - Root cause
KW - Spark
UR - http://www.scopus.com/inward/record.url?scp=85060279327&partnerID=8YFLogxK
U2 - 10.1016/j.future.2018.12.002
DO - 10.1016/j.future.2018.12.002
M3 - Article
AN - SCOPUS:85060279327
SN - 0167-739X
VL - 95
SP - 392
EP - 403
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
ER -