Automatic Binary Dataset Construction for Machine Learning
–Neural Information Processing Systems
Binary code is pervasive, and binary analysis is a key task in reverse engineering, malware classification, and vulnerability discovery. Unfortunately, while there exist large corpora of malicious binaries, obtaining high-quality corpora of benign binaries for modern systems has proven challenging (e.g., due to licensing issues). Consequently, machine learning based pipelines for binary analysis utilize either costly commercial corpora (e.g., VirusTotal) or open-source binaries (e.g., coreutils) available in limited quantities.
Neural Information Processing Systems
May-29-2025, 21:37:49 GMT
- Country:
- North America > United States > Maryland (0.28)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: