Goto

Collaborating Authors

 android malware detection


MH-1M: A 1.34 Million-Sample Comprehensive Multi-Feature Android Malware Dataset for Machine Learning, Deep Learning, Large Language Models, and Threat Intelligence Research

arXiv.org Artificial Intelligence

Abstract--We present MH-1M, one of the most comprehensive and up-to-date datasets for advanced Android malware research. The dataset comprises 1,340,515 applications, encompassing a wide range of features and extensive metadata. T o ensure accurate malware classification, we employ the VirusT otal API, integrating multiple detection engines for comprehensive and reliable assessment. Our GitHub, Figshare, and Harvard Dataverse repositories provide open access to the processed dataset and its extensive supplementary metadata, totaling more than 400 GB of data and including the outputs of the feature extraction pipeline as well as the corresponding VirusT otal reports. Our findings underscore the MH-1M dataset's invaluable role in understanding the evolving landscape of malware. The pervasive spread of Android malware poses a significant challenge for cybersecurity research. This challenge stems mainly from the open-source nature and affordability of Android platforms, which grant users access to a large market of free applications. At the same time, malware continually evolves, adapting its tactics to execute more sophisticated and frequent attacks. Such attacks often result in data destruction, information theft, and several other cybercrimes [1], [2], [3]. Machine learning (ML) algorithms have been widely used to uncover malware and have demonstrated remarkable effectiveness in detection systems, leveraging their discriminative capabilities to identify new variants of malicious applications [4], [5], [6]. To mitigate these risks, researchers have developed a variety of methods for detecting Android malware, establishing machine learning as a central focus of contemporary mobile security research [7], [8], [9]. However, the effectiveness of ML models is highly dependent on the quality of the datasets used for training. Many existing datasets suffer from limitations such as outdated data, inadequate representation, and a limited number of samples and features, making them unsuitable for modern malware detection [10], [2], [11], [12]. These issues raise concerns about the reliability of reported performance metrics and can potentially lead to misleading conclusions [2]. A growing body of research in Android malware detection strongly supports the notion that increasing the number of discriminative features can significantly improve classification performance [13], [14], [15]. We present in Table I an overview of widely used Android malware datasets from recent years.


Android Malware Detection: A Machine Leaning Approach

arXiv.org Artificial Intelligence

-- This study examines machine learning techniques like Decision Trees, Support V ector Machines, Logistic Regression, Neural Networks, and ensemble methods to detect Android malware. The study evaluates these models on a dataset of Android applications and analyzes their accuracy, efficiency, and real-world applicability. Key findings show that ensemble methods demonstrate superior performance, but there are trade-offs between model interpretability, efficiency, and accuracy. Given its increasing threat, the insights guide future research and practical use of ML to combat Android malware. I. INTRODUCTION Smartphones have brought in a new era of connectivity, convenience, and innovation, with Android being the most widely used mobile operating system [1], [2]. However, this ubiquity has come with challenges. The background of Android's ecosystem makes clear that the characteristics that make Android popular also leave it vulnerable to malicious activities. Specifically, Android's open-source nature, vast user base, and easy application distribution and installation have created an environment where cybercriminals can thrive. Thus, it is essential to understand the Android ecosystem's unique landscape to address the severe threat of Android malware. The following section sets the stage for exploring advanced malware detection techniques for Android devices in later sections. A. Background The extensive adoption of Android operating systems, with their open-source nature and customization capabilities, has led to them becoming a primary target for cybercriminals. Android's vast and diverse application ecosystem presents significant security challenges, as malicious applications can masquerade as legitimate ones, exploiting vulnerabilities and employing social engineering tactics [1]-[3]. These malicious activities include stealing sensitive information, sending premium-rate SMS messages, and installing additional payloads [4]-[5].


BinCtx: Multi-Modal Representation Learning for Robust Android App Behavior Detection

arXiv.org Artificial Intelligence

Mobile app markets host millions of apps, yet undesired behaviors (e.g., disruptive ads, illegal redirection, payment deception) remain hard to catch because they often do not rely on permission-protected APIs and can be easily camouflaged via UI or metadata edits. We present BINCTX, a learning approach that builds multi-modal representations of an app from (i) a global bytecode-as-image view that captures code-level semantics and family-style patterns, (ii) a contextual view (manifested actions, components, declared permissions, URL/IP constants) indicating how behaviors are triggered, and (iii) a third-party-library usage view summarizing invocation frequencies along inter-component call paths. The three views are embedded and fused to train a contextual-aware classifier. On real-world malware and benign apps, BINCTX attains a macro F1 of 94.73%, outperforming strong baselines by at least 14.92%. It remains robust under commercial obfuscation (F1 84% post-obfuscation) and is more resistant to adversarial samples than state-of-the-art bytecode-only systems.


DeepTrust: Multi-Step Classification through Dissimilar Adversarial Representations for Robust Android Malware Detection

arXiv.org Artificial Intelligence

Over the last decade, machine learning has been extensively applied to identify malicious Android applications. However, such approaches remain vulnerable against adversarial examples, i.e., examples that are subtly manipulated to fool a machine learning model into making incorrect predictions. This research presents DeepTrust, a novel metaheuristic that arranges flexible classifiers, like deep neural networks, into an ordered sequence where the final decision is made by a single internal model based on conditions activated in cascade. In the Robust Android Malware Detection competition at the 2025 IEEE Conference SaTML, DeepTrust secured the first place and achieved state-of-the-art results, outperforming the next-best competitor by up to 266% under feature-space evasion attacks. This is accomplished while maintaining the highest detection rate on non-adversarial malware and a false positive rate below 1%. The method's efficacy stems from maximizing the divergence of the learned representations among the internal models. By using classifiers inducing fundamentally dissimilar embeddings of the data, the decision space becomes unpredictable for an attacker. This frustrates the iterative perturbation process inherent to evasion attacks, enhancing system robustness without compromising accuracy on clean examples.


Empirical Evaluation of Concept Drift in ML-Based Android Malware Detection

arXiv.org Artificial Intelligence

This study examines the impact of concept drift on Android malware detection, evaluating two datasets and nine machine learning and deep learning algorithms, as well as Large Language Models (LLMs). Various feature types--static, dynamic, hybrid, semantic, and image-based--were considered. The results showed that concept drift is widespread and significantly affects model performance. Factors influencing the drift include feature types, data environments, and detection methods. Balancing algorithms helped with class imbalance but did not fully address concept drift, which primarily stems from the dynamic nature of the malware landscape. No strong link was found between the type of algorithm used and concept drift, the impact was relatively minor compared to other variables since hyperparameters were not fine-tuned, and the default algorithm configurations were used. While LLMs using few-shot learning demonstrated promising detection performance, they did not fully mitigate concept drift, highlighting the need for further investigation.


Understanding Concept Drift with Deprecated Permissions in Android Malware Detection

arXiv.org Artificial Intelligence

Abstract--Permission analysis is a widely used method for Android malware detection. It involves examining the permissions requested by an application to access sensitive data or perform potentially malicious actions. In recent years, various machine learning (ML) algorithms have been applied to Android malware detection using permission-based features and feature selection techniques, often achieving high accuracy . However, these studies have largely overlooked important factors such as protection levels and the deprecation or restriction of permissions due to updates in the Android OS--factors that can contribute to concept drift. In this study, we investigate the impact of deprecated and restricted permissions on the performance of machine learning models. A large dataset containing 166 permissions was used, encompassing more than 70,000 malware and benign applications. Various machine learning and deep learning algorithms were employed as classifiers, along with different concept drift detection strategies. The results suggest that Android permissions are highly effective features for malware detection, with the exclusion of deprecated and restricted permissions having only a marginal impact on model performance. In some cases, such as with CNN, accuracy improved. Excluding these permissions also enhanced the detection of concept drift using a year-to-year analysis strategy . Dataset balancing further improved model performance, reduced low-accuracy instances, and enhanced concept drift detection via the Kolmogorov-Smirnov test. Mobile devices are an essential tool in everyday life, providing users with access to a wide range of applications for communication, banking, entertainment, and productivity . T wo operating systems dominate the mobile market, Google Android and Apple iOS, with Android taking 71% of the market share by 2024 [1]. Android employs a permission-based security model that grants applications specific privileges to regulate access to sensitive resources.


MH-FSF: A Unified Framework for Overcoming Benchmarking and Reproducibility Limitations in Feature Selection Evaluation

arXiv.org Artificial Intelligence

MH-FSF: A Unified Framework for Overcoming Benchmarking and Reproducibility Limitations in Feature Selection Evaluation V anderson Rocha 1 1 Federal University of Amazonas (UFAM) Diego Kreutz 2 2 Federal University of Pampa (UNIP AMP A) Gabriel Canto 1 1 Federal University of Amazonas (UFAM) Hendrio Braganc a 1 1 Federal University of Amazonas (UFAM) Eduardo Feitosa 1 1 Federal University of Amazonas (UFAM) Abstract --Feature selection is vital for building effective predictive models, as it reduces dimensionality and emphasizes key features. However, current research often suffers from limited benchmarking and reliance on proprietary datasets. This severely hinders reproducibility and can negatively impact overall performance. T o address these limitations, we introduce the MH-FSF framework, a comprehensive, modular, and extensible platform designed to facilitate the reproduction and implementation of feature selection methods. Developed through collaborative research, MH-FSF provides implementations of 17 methods (11 classical, 6 domain-specific) and enables systematic evaluation on 10 publicly available Android malware datasets. Our results reveal performance variations across both balanced and imbalanced datasets, highlighting the critical need for data preprocessing and selection criteria that account for these asymmetries. We demonstrate the importance of a unified platform for comparing diverse feature selection techniques, fostering methodological consistency and rigor . By providing this framework, we aim to significantly broaden the existing literature and pave the way for new research directions in feature selection, particularly within the context of Android malware detection. I NTRODUCTION Feature selection is crucial for constructing effective predictive models. By identifying and focusing on the most relevant feature subsets, it reduces data dimensionality, leading to improved model accuracy and significantly decreased computational overhead during training [1].


VOLTRON: Detecting Unknown Malware Using Graph-Based Zero-Shot Learning

arXiv.org Artificial Intelligence

The persistent threat of Android malware presents a serious challenge to the security of millions of users globally. While many machine learning-based methods have been developed to detect these threats, their reliance on large labeled datasets limits their effectiveness against emerging, previously unseen malware families, for which labeled data is scarce or nonexistent. To address this challenge, we introduce a novel zero-shot learning framework that combines Variational Graph Auto-Encoders (VGAE) with Siamese Neural Networks (SNN) to identify malware without needing prior examples of specific malware families. Our approach leverages graph-based representations of Android applications, enabling the model to detect subtle structural differences between benign and malicious software, even in the absence of labeled data for new threats. Experimental results show that our method outperforms the state-of-the-art MaMaDroid, especially in zero-day malware detection. Our model achieves 96.24% accuracy and 95.20% recall for unknown malware families, highlighting its robustness against evolving Android threats.


LAMD: Context-driven Android Malware Detection and Classification with LLMs

arXiv.org Artificial Intelligence

The rapid growth of mobile applications has escalated Android malware threats. Although there are numerous detection methods, they often struggle with evolving attacks, dataset biases, and limited explainability. Large Language Models (LLMs) offer a promising alternative with their zero-shot inference and reasoning capabilities. However, applying LLMs to Android malware detection presents two key challenges: (1)the extensive support code in Android applications, often spanning thousands of classes, exceeds LLMs' context limits and obscures malicious behavior within benign functionality; (2)the structural complexity and interdependencies of Android applications surpass LLMs' sequence-based reasoning, fragmenting code analysis and hindering malicious intent inference. To address these challenges, we propose LAMD, a practical context-driven framework to enable LLM-based Android malware detection. LAMD integrates key context extraction to isolate security-critical code regions and construct program structures, then applies tier-wise code reasoning to analyze application behavior progressively, from low-level instructions to high-level semantics, providing final prediction and explanation. A well-designed factual consistency verification mechanism is equipped to mitigate LLM hallucinations from the first tier. Evaluation in real-world settings demonstrates LAMD's effectiveness over conventional detectors, establishing a feasible basis for LLM-driven malware analysis in dynamic threat landscapes.


XAI and Android Malware Models

arXiv.org Artificial Intelligence

Android malware detection based on machine learning (ML) and deep learning (DL) models is widely used for mobile device security. Such models offer benefits in terms of detection accuracy and efficiency, but it is often difficult to understand how such learning models make decisions. As a result, these popular malware detection strategies are generally treated as black boxes, which can result in a lack of trust in the decisions made, as well as making adversarial attacks more difficult to detect. The field of eXplainable Artificial Intelligence (XAI) attempts to shed light on such black box models. In this paper, we apply XAI techniques to ML and DL models that have been trained on a challenging Android malware classification problem. Specifically, the classic ML models considered are Support Vector Machines (SVM), Random Forest, and $k$-Nearest Neighbors ($k$-NN), while the DL models we consider are Multi-Layer Perceptrons (MLP) and Convolutional Neural Networks (CNN). The state-of-the-art XAI techniques that we apply to these trained models are Local Interpretable Model-agnostic Explanations (LIME), Shapley Additive exPlanations (SHAP), PDP plots, ELI5, and Class Activation Mapping (CAM). We obtain global and local explanation results, and we discuss the utility of XAI techniques in this problem domain. We also provide a literature review of XAI work related to Android malware.