Naming the Unknown: Labeling Unknown Files Through Machine Learning
A study by Trend Micro researchers showed that more than 83 percent of all downloaded software files are unknown or unclassified, even two years after they were first observed in the wild. And because most malware threats come from software download events, they subsequently developed a human-readable machine learning system that successfully classifies unknown files into either benign or malicious in nature. The study involved a dataset of 3 million anonymized web-based software download events gathered in a seven-month period. These events were studied and analyzed using multiple sources of ground truth both from internal and proprietary Trend Micro systems and publicly available ones. However, less than 17 percent of the dataset were labeled using traditional means.
Aug-17-2018, 09:19:40 GMT