Reducing Instability in Synthetic Data Evaluation with a Super-Metric in MalDataGen

da Silva, Anna Luiza Gomes, Kreutz, Diego, Diniz, Angelo, Mansilha, Rodrigo, da Fonseca, Celso Nobre

Nov-21-2025–arXiv.org Artificial Intelligence

Evaluating the quality of synthetic data remains a persistent challenge in the Android malware domain due to instability and the lack of standardization among existing metrics. Experiments involving ten generative models and five balanced datasets demonstrate that the Super-Metric is more stable and consistent than traditional metrics, exhibiting stronger correlations with the actual performance of classifiers. Synthetic data generation has become an increasingly relevant strategy in cybersecurity [1], [2], [3], particularly as a way to mitigate the scarcity of real, complete, and high-quality datasets that limit the performance and generalization of machine learning models. Despite these advances, assessing the quality of synthetic data remains a complex and largely non-standardized methodological challenge [4], with no clear consensus on which metrics should be used or how to combine them consistently. The literature reports a significant fragmentation in the application of fidelity metrics, with studies identifying more than 65 distinct indicators used independently to assess fidelity [5]. This diversity hinders model-to-model comparison, reduces experimental reproducibility, and complicates the integrated interpretation of data quality.

artificial intelligence, machine learning, super-metric, (16 more...)

arXiv.org Artificial Intelligence

Nov-21-2025

arXiv.org PDF

Add feedback

Country:
- South America > Brazil (0.15)

Genre:
- Research Report (0.64)

Industry:
- Information Technology > Security & Privacy (0.79)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found