Revisiting Static Feature-Based Android Malware Detection

Alam, Md Tanvirul, Bhusal, Dipkamal, Rastogi, Nidhi

Sep-11-2024–arXiv.org Artificial Intelligence

The increasing reliance on machine learning (ML) in computer security, particularly for malware classification, has driven significant advancements. However, the replicability and reproducibility of these results are often overlooked, leading to challenges in verifying research findings. This paper highlights critical pitfalls that undermine the validity of ML research in Android malware detection, focusing on dataset and methodological issues. We comprehensively analyze Android malware detection using two datasets and assess offline and continual learning settings with six widely used ML models. Our study reveals that when properly tuned, simpler baseline methods can often outperform more complex models. To address reproducibility challenges, we propose solutions for improving datasets and methodological practices, enabling fairer model comparisons. Additionally, we open-source our code to facilitate malware analysis, making it extensible for new models and datasets. Our paper aims to support future research in Android malware detection and other security domains, enhancing the reliability and reproducibility of published results.

dataset, learning, malware detection, (15 more...)

arXiv.org Artificial Intelligence

Sep-11-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Nevada (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
  - California
    - San Diego County > San Diego (0.04)
    - Los Angeles County > Long Beach (0.04)
- Europe
  - France > Île-de-France (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia > Myanmar
  - Tanintharyi Region > Dawei (0.04)

Genre:
- Overview (0.88)
- Research Report > New Finding (0.68)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Communications > Mobile (1.00)
  - Artificial Intelligence > Machine Learning
    - Statistical Learning (1.00)
    - Ensemble Learning (0.71)
    - Neural Networks > Deep Learning (0.68)
    - Performance Analysis > Accuracy (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found