TY - GEN
T1 - LOGAN
T2 - 4th IEEE Annual International Conference on Cloud Engineering, IC2E 2016
AU - Tak, Byung Chul
AU - Tao, Shu
AU - Yang, Lin
AU - Zhu, Chao
AU - Ruan, Yaoping
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/6/1
Y1 - 2016/6/1
N2 - Problem diagnosis is one crucial aspect in the cloud operation that is becoming increasingly challenging. On the one hand, the volume of logs generated in today's cloud is overwhelmingly large. On the other hand, cloud architecture becomes more distributed and complex, which makes it more difficult to troubleshoot failures. In order to address these challenges, we have developed a tool, called LOGAN, that enables operators to quickly identify the log entries that potentially lead to the root cause of a problem. It constructs behavioral reference models from logs that represent the normal patterns. When problem occurs, our tool enables operators to inspect the divergence of current logs from the reference model and highlight logs likely to contain the hints to the root cause. To support these capabilities we have designed and developed several mechanisms. First, we developed log correlation algorithms using various IDs embedded in logs to help identify and isolate log entries that belong to the failed request. Second, we provide efficient log comparison to help understand the differences between different executions. Finally we designed mechanisms to highlight critical log entries that are likely to contain information pertaining to the root cause of the problem. We have implemented the proposed approach in a popular cloud management system, OpenStack, and through case studies, we demonstrate this tool can help operators perform problem diagnosis quickly and effectively.
AB - Problem diagnosis is one crucial aspect in the cloud operation that is becoming increasingly challenging. On the one hand, the volume of logs generated in today's cloud is overwhelmingly large. On the other hand, cloud architecture becomes more distributed and complex, which makes it more difficult to troubleshoot failures. In order to address these challenges, we have developed a tool, called LOGAN, that enables operators to quickly identify the log entries that potentially lead to the root cause of a problem. It constructs behavioral reference models from logs that represent the normal patterns. When problem occurs, our tool enables operators to inspect the divergence of current logs from the reference model and highlight logs likely to contain the hints to the root cause. To support these capabilities we have designed and developed several mechanisms. First, we developed log correlation algorithms using various IDs embedded in logs to help identify and isolate log entries that belong to the failed request. Second, we provide efficient log comparison to help understand the differences between different executions. Finally we designed mechanisms to highlight critical log entries that are likely to contain information pertaining to the root cause of the problem. We have implemented the proposed approach in a popular cloud management system, OpenStack, and through case studies, we demonstrate this tool can help operators perform problem diagnosis quickly and effectively.
UR - http://www.scopus.com/inward/record.url?scp=84978154653&partnerID=8YFLogxK
U2 - 10.1109/IC2E.2016.12
DO - 10.1109/IC2E.2016.12
M3 - Conference contribution
AN - SCOPUS:84978154653
T3 - Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016: Co-located with the 1st IEEE International Conference on Internet-of-Things Design and Implementation, IoTDI 2016
SP - 62
EP - 67
BT - Proceedings - 2016 IEEE International Conference on Cloud Engineering, IC2E 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 4 April 2016 through 8 April 2016
ER -