AITopics | android malware detection

Collaborating Authors

android malware detection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MH-1M: A 1.34 Million-Sample Comprehensive Multi-Feature Android Malware Dataset for Machine Learning, Deep Learning, Large Language Models, and Threat Intelligence Research

Braganca, Hendrio, Kreutz, Diego, Rocha, Vanderson, Assolin, Joner, Feitosa, and Eduardo

arXiv.org Artificial IntelligenceNov-4-2025

Abstract--We present MH-1M, one of the most comprehensive and up-to-date datasets for advanced Android malware research. The dataset comprises 1,340,515 applications, encompassing a wide range of features and extensive metadata. T o ensure accurate malware classification, we employ the VirusT otal API, integrating multiple detection engines for comprehensive and reliable assessment. Our GitHub, Figshare, and Harvard Dataverse repositories provide open access to the processed dataset and its extensive supplementary metadata, totaling more than 400 GB of data and including the outputs of the feature extraction pipeline as well as the corresponding VirusT otal reports. Our findings underscore the MH-1M dataset's invaluable role in understanding the evolving landscape of malware. The pervasive spread of Android malware poses a significant challenge for cybersecurity research. This challenge stems mainly from the open-source nature and affordability of Android platforms, which grant users access to a large market of free applications. At the same time, malware continually evolves, adapting its tactics to execute more sophisticated and frequent attacks. Such attacks often result in data destruction, information theft, and several other cybercrimes [1], [2], [3]. Machine learning (ML) algorithms have been widely used to uncover malware and have demonstrated remarkable effectiveness in detection systems, leveraging their discriminative capabilities to identify new variants of malicious applications [4], [5], [6]. To mitigate these risks, researchers have developed a variety of methods for detecting Android malware, establishing machine learning as a central focus of contemporary mobile security research [7], [8], [9]. However, the effectiveness of ML models is highly dependent on the quality of the datasets used for training. Many existing datasets suffer from limitations such as outdated data, inadequate representation, and a limited number of samples and features, making them unsuitable for modern malware detection [10], [2], [11], [12]. These issues raise concerns about the reliability of reported performance metrics and can potentially lead to misleading conclusions [2]. A growing body of research in Android malware detection strongly supports the notion that increasing the number of discriminative features can significantly improve classification performance [13], [14], [15]. We present in Table I an overview of widely used Android malware datasets from recent years.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.00342

Country: South America > Brazil (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Android Malware Detection: A Machine Leaning Approach

Abdulla, Hasan

arXiv.org Artificial IntelligenceNov-4-2025

-- This study examines machine learning techniques like Decision Trees, Support V ector Machines, Logistic Regression, Neural Networks, and ensemble methods to detect Android malware. The study evaluates these models on a dataset of Android applications and analyzes their accuracy, efficiency, and real-world applicability. Key findings show that ensemble methods demonstrate superior performance, but there are trade-offs between model interpretability, efficiency, and accuracy. Given its increasing threat, the insights guide future research and practical use of ML to combat Android malware. I. INTRODUCTION Smartphones have brought in a new era of connectivity, convenience, and innovation, with Android being the most widely used mobile operating system [1], [2]. However, this ubiquity has come with challenges. The background of Android's ecosystem makes clear that the characteristics that make Android popular also leave it vulnerable to malicious activities. Specifically, Android's open-source nature, vast user base, and easy application distribution and installation have created an environment where cybercriminals can thrive. Thus, it is essential to understand the Android ecosystem's unique landscape to address the severe threat of Android malware. The following section sets the stage for exploring advanced malware detection techniques for Android devices in later sections. A. Background The extensive adoption of Android operating systems, with their open-source nature and customization capabilities, has led to them becoming a primary target for cybercriminals. Android's vast and diverse application ecosystem presents significant security challenges, as malicious applications can masquerade as legitimate ones, exploiting vulnerabilities and employing social engineering tactics [1]-[3]. These malicious activities include stealing sensitive information, sending premium-rate SMS messages, and installing additional payloads [4]-[5].

artificial intelligence, machine learning, malware detection, (13 more...)

arXiv.org Artificial Intelligence

2511.00894

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

BinCtx: Multi-Modal Representation Learning for Robust Android App Behavior Detection

Liu, Zichen, Yang, Shao, Xiao, Xusheng

arXiv.org Artificial IntelligenceOct-17-2025

Mobile app markets host millions of apps, yet undesired behaviors (e.g., disruptive ads, illegal redirection, payment deception) remain hard to catch because they often do not rely on permission-protected APIs and can be easily camouflaged via UI or metadata edits. We present BINCTX, a learning approach that builds multi-modal representations of an app from (i) a global bytecode-as-image view that captures code-level semantics and family-style patterns, (ii) a contextual view (manifested actions, components, declared permissions, URL/IP constants) indicating how behaviors are triggered, and (iii) a third-party-library usage view summarizing invocation frequencies along inter-component call paths. The three views are embedded and fused to train a contextual-aware classifier. On real-world malware and benign apps, BINCTX attains a macro F1 of 94.73%, outperforming strong baselines by at least 14.92%. It remains robust under commercial obfuscation (F1 84% post-obfuscation) and is more resistant to adversarial samples than state-of-the-art bytecode-only systems.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.14344

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
(3 more...)

Add feedback

DeepTrust: Multi-Step Classification through Dissimilar Adversarial Representations for Robust Android Malware Detection

Pulido-Cortázar, Daniel, Gibert, Daniel, Manyà, Felip

arXiv.org Artificial IntelligenceOct-15-2025

Over the last decade, machine learning has been extensively applied to identify malicious Android applications. However, such approaches remain vulnerable against adversarial examples, i.e., examples that are subtly manipulated to fool a machine learning model into making incorrect predictions. This research presents DeepTrust, a novel metaheuristic that arranges flexible classifiers, like deep neural networks, into an ordered sequence where the final decision is made by a single internal model based on conditions activated in cascade. In the Robust Android Malware Detection competition at the 2025 IEEE Conference SaTML, DeepTrust secured the first place and achieved state-of-the-art results, outperforming the next-best competitor by up to 266% under feature-space evasion attacks. This is accomplished while maintaining the highest detection rate on non-adversarial malware and a false positive rate below 1%. The method's efficacy stems from maximizing the divergence of the learned representations among the internal models. By using classifiers inducing fundamentally dissimilar embeddings of the data, the decision space becomes unpredictable for an attacker. This frustrates the iterative perturbation process inherent to evasion attacks, enhancing system robustness without compromising accuracy on clean examples.

artificial intelligence, machine learning, robustness, (18 more...)

arXiv.org Artificial Intelligence

2510.1231

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Empirical Evaluation of Concept Drift in ML-Based Android Malware Detection

Sabbah, Ahmed, Jarrar, Radi, Zein, Samer, Mohaisen, David

arXiv.org Artificial IntelligenceJul-31-2025

This study examines the impact of concept drift on Android malware detection, evaluating two datasets and nine machine learning and deep learning algorithms, as well as Large Language Models (LLMs). Various feature types--static, dynamic, hybrid, semantic, and image-based--were considered. The results showed that concept drift is widespread and significantly affects model performance. Factors influencing the drift include feature types, data environments, and detection methods. Balancing algorithms helped with class imbalance but did not fully address concept drift, which primarily stems from the dynamic nature of the malware landscape. No strong link was found between the type of algorithm used and concept drift, the impact was relatively minor compared to other variables since hyperparameters were not fine-tuned, and the default algorithm configurations were used. While LLMs using few-shot learning demonstrated promising detection performance, they did not fully mitigate concept drift, highlighting the need for further investigation.

concept drift, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2507.22772

Country:

North America > United States (0.46)
Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding Concept Drift with Deprecated Permissions in Android Malware Detection

Sabbah, Ahmed, Jarrar, Radi, Zein, Samer, Mohaisen, David

arXiv.org Artificial IntelligenceJul-31-2025

Abstract--Permission analysis is a widely used method for Android malware detection. It involves examining the permissions requested by an application to access sensitive data or perform potentially malicious actions. In recent years, various machine learning (ML) algorithms have been applied to Android malware detection using permission-based features and feature selection techniques, often achieving high accuracy . However, these studies have largely overlooked important factors such as protection levels and the deprecation or restriction of permissions due to updates in the Android OS--factors that can contribute to concept drift. In this study, we investigate the impact of deprecated and restricted permissions on the performance of machine learning models. A large dataset containing 166 permissions was used, encompassing more than 70,000 malware and benign applications. Various machine learning and deep learning algorithms were employed as classifiers, along with different concept drift detection strategies. The results suggest that Android permissions are highly effective features for malware detection, with the exclusion of deprecated and restricted permissions having only a marginal impact on model performance. In some cases, such as with CNN, accuracy improved. Excluding these permissions also enhanced the detection of concept drift using a year-to-year analysis strategy . Dataset balancing further improved model performance, reduced low-accuracy instances, and enhanced concept drift detection via the Kolmogorov-Smirnov test. Mobile devices are an essential tool in everyday life, providing users with access to a wide range of applications for communication, banking, entertainment, and productivity . T wo operating systems dominate the mobile market, Google Android and Apple iOS, with Android taking 71% of the market share by 2024 [1]. Android employs a permission-based security model that grants applications specific privileges to regulate access to sensitive resources.

artificial intelligence, machine learning, permission, (16 more...)

arXiv.org Artificial Intelligence

2507.22231

Country:

Asia (0.46)
North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

MH-FSF: A Unified Framework for Overcoming Benchmarking and Reproducibility Limitations in Feature Selection Evaluation

Rocha, Vanderson, Kreutz, Diego, Canto, Gabriel, Bragança, Hendrio, Feitosa, Eduardo

arXiv.org Artificial IntelligenceJul-16-2025

MH-FSF: A Unified Framework for Overcoming Benchmarking and Reproducibility Limitations in Feature Selection Evaluation V anderson Rocha 1 1 Federal University of Amazonas (UFAM) Diego Kreutz 2 2 Federal University of Pampa (UNIP AMP A) Gabriel Canto 1 1 Federal University of Amazonas (UFAM) Hendrio Braganc a 1 1 Federal University of Amazonas (UFAM) Eduardo Feitosa 1 1 Federal University of Amazonas (UFAM) Abstract --Feature selection is vital for building effective predictive models, as it reduces dimensionality and emphasizes key features. However, current research often suffers from limited benchmarking and reliance on proprietary datasets. This severely hinders reproducibility and can negatively impact overall performance. T o address these limitations, we introduce the MH-FSF framework, a comprehensive, modular, and extensible platform designed to facilitate the reproduction and implementation of feature selection methods. Developed through collaborative research, MH-FSF provides implementations of 17 methods (11 classical, 6 domain-specific) and enables systematic evaluation on 10 publicly available Android malware datasets. Our results reveal performance variations across both balanced and imbalanced datasets, highlighting the critical need for data preprocessing and selection criteria that account for these asymmetries. We demonstrate the importance of a unified platform for comparing diverse feature selection techniques, fostering methodological consistency and rigor . By providing this framework, we aim to significantly broaden the existing literature and pave the way for new research directions in feature selection, particularly within the context of Android malware detection. I NTRODUCTION Feature selection is crucial for constructing effective predictive models. By identifying and focusing on the most relevant feature subsets, it reduces data dimensionality, leading to improved model accuracy and significantly decreased computational overhead during training [1].

artificial intelligence, feature selection method, machine learning, (7 more...)

arXiv.org Artificial Intelligence

2507.10591

Country: South America > Brazil (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

VOLTRON: Detecting Unknown Malware Using Graph-Based Zero-Shot Learning

Akdeniz, M. Tahir, Yeşilkaya, Zeynep, Köse, İ. Enes, Ünal, İ. Ulaş, Şen, Sevil

arXiv.org Artificial IntelligenceJul-8-2025

The persistent threat of Android malware presents a serious challenge to the security of millions of users globally. While many machine learning-based methods have been developed to detect these threats, their reliance on large labeled datasets limits their effectiveness against emerging, previously unseen malware families, for which labeled data is scarce or nonexistent. To address this challenge, we introduce a novel zero-shot learning framework that combines Variational Graph Auto-Encoders (VGAE) with Siamese Neural Networks (SNN) to identify malware without needing prior examples of specific malware families. Our approach leverages graph-based representations of Android applications, enabling the model to detect subtle structural differences between benign and malicious software, even in the absence of labeled data for new threats. Experimental results show that our method outperforms the state-of-the-art MaMaDroid, especially in zero-day malware detection. Our model achieves 96.24% accuracy and 95.20% recall for unknown malware families, highlighting its robustness against evolving Android threats.

machine learning, malware family, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.04275

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Add feedback

LAMD: Context-driven Android Malware Detection and Classification with LLMs

Qian, Xingzhi, Zheng, Xinran, He, Yiling, Yang, Shuo, Cavallaro, Lorenzo

arXiv.org Artificial IntelligenceFeb-18-2025

The rapid growth of mobile applications has escalated Android malware threats. Although there are numerous detection methods, they often struggle with evolving attacks, dataset biases, and limited explainability. Large Language Models (LLMs) offer a promising alternative with their zero-shot inference and reasoning capabilities. However, applying LLMs to Android malware detection presents two key challenges: (1)the extensive support code in Android applications, often spanning thousands of classes, exceeds LLMs' context limits and obscures malicious behavior within benign functionality; (2)the structural complexity and interdependencies of Android applications surpass LLMs' sequence-based reasoning, fragmenting code analysis and hindering malicious intent inference. To address these challenges, we propose LAMD, a practical context-driven framework to enable LLM-based Android malware detection. LAMD integrates key context extraction to isolate security-critical code regions and construct program structures, then applies tier-wise code reasoning to analyze application behavior progressively, from low-level instructions to high-level semantics, providing final prediction and explanation. A well-designed factual consistency verification mechanism is equipped to mitigate LLM hallucinations from the first tier. Evaluation in real-world settings demonstrates LAMD's effectiveness over conventional detectors, establishing a feasible basis for LLM-driven malware analysis in dynamic threat landscapes.

large language model, machine learning, malware detection, (18 more...)

arXiv.org Artificial Intelligence

2502.13055

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

XAI and Android Malware Models

Kulkarni, Maithili, Stamp, Mark

arXiv.org Artificial IntelligenceNov-25-2024

Android malware detection based on machine learning (ML) and deep learning (DL) models is widely used for mobile device security. Such models offer benefits in terms of detection accuracy and efficiency, but it is often difficult to understand how such learning models make decisions. As a result, these popular malware detection strategies are generally treated as black boxes, which can result in a lack of trust in the decisions made, as well as making adversarial attacks more difficult to detect. The field of eXplainable Artificial Intelligence (XAI) attempts to shed light on such black box models. In this paper, we apply XAI techniques to ML and DL models that have been trained on a challenging Android malware classification problem. Specifically, the classic ML models considered are Support Vector Machines (SVM), Random Forest, and $k$-Nearest Neighbors ($k$-NN), while the DL models we consider are Multi-Layer Perceptrons (MLP) and Convolutional Neural Networks (CNN). The state-of-the-art XAI techniques that we apply to these trained models are Local Interpretable Model-agnostic Explanations (LIME), Shapley Additive exPlanations (SHAP), PDP plots, ELI5, and Class Activation Mapping (CAM). We obtain global and local explanation results, and we discuss the utility of XAI techniques in this problem domain. We also provide a literature review of XAI work related to Android malware.

artificial intelligence, machine learning, shapley value, (17 more...)

arXiv.org Artificial Intelligence

2411.16817

Country: Asia > Nepal (0.04)

Genre:

Research Report (1.00)
Overview (1.00)
Personal > Honors (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.88)

Add feedback