TY - CHAP
T1 - Android Malware Detection Based on Novel Representations of Apps
AU - Sun, Tiezhu
AU - Daoudi, Nadia
AU - Allix, Kevin
AU - Samhi, Jordan
AU - Kim, Kisub
AU - Zhou, Xin
AU - Kabore, Abdoul Kader
AU - Kim, Dongsun
AU - Lo, David
AU - Bissyandé, Tegawendé François
AU - Klein, Jacques
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - In the past decade, advancements in computer vision (CV) and natural language processing (NLP) have been driven significantly by deep representation learning. This progress has made image and text representation learning appealing for applications in fields like malware detection, where deep learning methods can overcome the limitations of traditional hand-crafted feature-based approaches, offering enhanced adaptability to various malware variants. This chapter introduces two novel approaches in malware representation learning that leverage these advancements: DexRay and DexBERT. DexRay employs image-based techniques, transforming DEX file bytecode of apps into grayscale “vector” images. These images are then analyzed using a one-dimensional convolutional neural model to determine the presence of malware. DexBERT, inspired by the BERT language model, processes Smali instructions disassembled from bytecode to generate high-level embedding vectors. These vectors are pivotal for tasks such as malicious code localization and malware detection. Both DexRay and DexBERT have demonstrated significant improvements over traditional machine learning methods in malware detection, particularly in terms of accuracy, efficiency, and adaptability to new malware types. This chapter delves into the methodologies and experimental results of these techniques, highlighting their contributions to the field of malware detection and offering insights into their potential for broader applications in cybersecurity.
AB - In the past decade, advancements in computer vision (CV) and natural language processing (NLP) have been driven significantly by deep representation learning. This progress has made image and text representation learning appealing for applications in fields like malware detection, where deep learning methods can overcome the limitations of traditional hand-crafted feature-based approaches, offering enhanced adaptability to various malware variants. This chapter introduces two novel approaches in malware representation learning that leverage these advancements: DexRay and DexBERT. DexRay employs image-based techniques, transforming DEX file bytecode of apps into grayscale “vector” images. These images are then analyzed using a one-dimensional convolutional neural model to determine the presence of malware. DexBERT, inspired by the BERT language model, processes Smali instructions disassembled from bytecode to generate high-level embedding vectors. These vectors are pivotal for tasks such as malicious code localization and malware detection. Both DexRay and DexBERT have demonstrated significant improvements over traditional machine learning methods in malware detection, particularly in terms of accuracy, efficiency, and adaptability to new malware types. This chapter delves into the methodologies and experimental results of these techniques, highlighting their contributions to the field of malware detection and offering insights into their potential for broader applications in cybersecurity.
UR - http://www.scopus.com/inward/record.url?scp=85211115715&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-66245-4_8
DO - 10.1007/978-3-031-66245-4_8
M3 - Chapter
AN - SCOPUS:85211115715
T3 - Advances in Information Security
SP - 197
EP - 212
BT - Advances in Information Security
PB - Springer
ER -