MalDataGen: A Modular Framework for Synthetic Tabular Data Generation in Malware Detection
Paim, Kayua Oleques, Nogueira, Angelo Gaspar Diniz, Kreutz, Diego, Cordeiro, Weverton, Mansilha, Rodrigo Brandao
–arXiv.org Artificial Intelligence
High-quality data scarcity hinders malware detection, limiting ML performance. We introduce MalDataGen, an open-source modular framework for generating high-fidelity synthetic tabular data using modular deep learning models (e.g., WGAN-GP, VQ-V AE). Evaluated via dual validation (TR-TS/TS-TR), seven classifiers, and utility metrics, MalDataGen outperforms benchmarks like SDV while preserving data utility. Its flexible design enables seamless integration into detection pipelines, offering a practical solution for cybersecurity applications. I. Introduction Modern machine learning algorithms, particularly deep learning architectures, depend on large-scale datasets with reliable annotations to achieve optimal performance.
arXiv.org Artificial Intelligence
Nov-4-2025
- Country:
- South America > Brazil > Rio Grande do Sul (0.04)
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: