TY - GEN
T1 - Multi-class Malware Detection via Deep Graph Convolutional Networks Using TF-IDF-Based Attributed Call Graphs
AU - Khan, Irshad
AU - Kwon, Young Woo
N1 - Publisher Copyright:
© 2024, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
PY - 2024
Y1 - 2024
N2 - The proliferation of malware in the Android ecosystem poses significant security risks and financial losses for enterprises and developers. Malware constantly evolves, exhibiting dynamic behavior and complexity, thus making it challenging to develop robust defense mechanisms. Traditional methods, such as signature-based and battery-monitoring approaches, struggle to detect emerging malware variants effectively. Recent advancements in deep learning have shown promising results in Android malware detection. However, most existing approaches focus on binary classification and need more insights into the model’s generality across different types of malware. This study presents a novel approach to address Android malware detection by integrating TF-IDF (Term Frequency-Inverse Document Frequency) features into the call graph structure. By attributing each node in the call graph with TF-IDF-based feature vectors extracted from the opcode sequences of each method using an opcode list, we present a more thorough representation that encapsulates the complex traits of the malware samples. We employ state-of-the-art graph-based deep learning models to classify malware families, including Graph Convolutional Networks (GCN), SAGEConv, Graph Attention Networks (GAT), and Graph Isomorphism Networks (GIN). By incorporating high-level structural information from the call graphs and TF-IDF-based raw features, our approach aims to enhance the accuracy and generality of the malware detection models. We identify an optimal model for the Android malware family classification task through extensive evaluation and comparison of the above-mentioned models. The findings of this study contribute to advancing the field of Android malware detection and provide insights into the effectiveness of graph-based deep learning models for combating evolving malware threats.
AB - The proliferation of malware in the Android ecosystem poses significant security risks and financial losses for enterprises and developers. Malware constantly evolves, exhibiting dynamic behavior and complexity, thus making it challenging to develop robust defense mechanisms. Traditional methods, such as signature-based and battery-monitoring approaches, struggle to detect emerging malware variants effectively. Recent advancements in deep learning have shown promising results in Android malware detection. However, most existing approaches focus on binary classification and need more insights into the model’s generality across different types of malware. This study presents a novel approach to address Android malware detection by integrating TF-IDF (Term Frequency-Inverse Document Frequency) features into the call graph structure. By attributing each node in the call graph with TF-IDF-based feature vectors extracted from the opcode sequences of each method using an opcode list, we present a more thorough representation that encapsulates the complex traits of the malware samples. We employ state-of-the-art graph-based deep learning models to classify malware families, including Graph Convolutional Networks (GCN), SAGEConv, Graph Attention Networks (GAT), and Graph Isomorphism Networks (GIN). By incorporating high-level structural information from the call graphs and TF-IDF-based raw features, our approach aims to enhance the accuracy and generality of the malware detection models. We identify an optimal model for the Android malware family classification task through extensive evaluation and comparison of the above-mentioned models. The findings of this study contribute to advancing the field of Android malware detection and provide insights into the effectiveness of graph-based deep learning models for combating evolving malware threats.
KW - call graph
KW - graph convolutional model
KW - Malware
KW - TF-IDF
UR - http://www.scopus.com/inward/record.url?scp=85182588737&partnerID=8YFLogxK
U2 - 10.1007/978-981-99-8024-6_15
DO - 10.1007/978-981-99-8024-6_15
M3 - Conference contribution
AN - SCOPUS:85182588737
SN - 9789819980239
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 188
EP - 200
BT - Information Security Applications - 24th International Conference, WISA 2023, Jeju Island, South Korea, August 23–25, 2023, Revised Selected Papers
A2 - Kim, Howon
A2 - Youn, Jonghee
PB - Springer Science and Business Media Deutschland GmbH
T2 - 24th International Conference on Information Security Applications, WISA 2023
Y2 - 23 August 2023 through 25 August 2023
ER -