Goto

Collaborating Authors

 unknown file


Faster and More Accurate Malware Detection Through Predictive Machine Learning: Correlating Static and Behavioral Features

#artificialintelligence

Decades even before the buzz went off, machine learning has proven its ability to decipher information from vast datasets to see hard-to-spot patterns, classify and cluster data, as well as make predictions using algorithms. With its myriad of real-life applications, cybersecurity remains to be one of its top use areas: It gives traditional cybersecurity solutions the edge it needs to catch destructive threats such as ransomware before it gets deployed in a system, which saves organizations' time, money, and reputations. Traditional machine learning largely deals with historical knowledge. It allows computers to make inferences based on datasets that have been previously labeled by humans. In cybersecurity, training a machine learning model to learn what malicious files and programs look like can help in the discovery of new, emerging, or unclassified threats via correlation.


Naming the Unknown: Labeling Unknown Files Through Machine Learning

#artificialintelligence

A study by Trend Micro researchers showed that more than 83 percent of all downloaded software files are unknown or unclassified, even two years after they were first observed in the wild. And because most malware threats come from software download events, they subsequently developed a human-readable machine learning system that successfully classifies unknown files into either benign or malicious in nature. The study involved a dataset of 3 million anonymized web-based software download events gathered in a seven-month period. These events were studied and analyzed using multiple sources of ground truth both from internal and proprietary Trend Micro systems and publicly available ones. However, less than 17 percent of the dataset were labeled using traditional means.


Uncovering Unknown Threats With Human-Readable Machine Learning

#artificialintelligence

Aided by machine learning, we analyzed data on 3 million software downloads from hundreds of thousands of internet-connected machines. We looked into the major domains from where different malware categories were downloaded and discussed which client applications were mostly targeted by malware infection. We also looked at code signing abuse and examined certain certification authorities that were found with certificates that were used for signing malicious code. In this blog post, we will discuss how we developed a human-readable machine learning system that is able to determine whether a downloaded file is benign or malicious in nature. The development of this actionable intelligent system stemmed from the question: How can we make our knowledge about global software download events actionable?


Machine Learning in Malware Detection

#artificialintelligence

Malware recognition modules decide if an object is a threat, based on the data they have collected on it. This data may be collected at different phases: – Pre-execution phase data is anything you can tell about a file without executing it. This may include executable file format descriptions, code descriptions, binary data statistics, text strings and information extracted via code emulation and other similar data. In the early epochs of the cyber era, the number of malware threats was relatively low, and simple handcrafted pre-execution rules were often enough to detect threats. But a decade ago, the tremendous growth of the malware stream did not allow anti-malware solutions to rely solely on the expensive manual creation of detection rules. It was natural for anti-malware companies to start augmenting their malware detection and classification with machine learning, a computer science area that has shown great success in image recognition, searching and decision- making. Machine Learning Methods for Malware Detection In this article, we summarize our decade's worth of experience with implementing machine learning into protecting our customers from cyberthreats. In other words, a machine learning algorithm discovers and formalizes the principles that underlie the data it sees. With this knowledge, the algorithm can reason the properties of previously unseen samples. In malware detection, a previously unseen sample could be a new file. Its hidden property could be malware or benign. A mathematically formalized set of principles underlying data properties is called the model. Machine learning has a broad variety of approaches that it takes to a solution rather than a single method. These approaches have different capacities and different tasks that they suit best. Unsupervised learning One machine learning approach is unsupervised learning. In this setting, we are given only a data set without the right answers for the task. The goal is to discover the structure of the data or the law of data generation. One important example is clustering. Clustering is a task that includes splitting a data set into groups of similar objects. Another task is representation learning – this includes building an informative feature set for objects based on their low- level description (for example, an autoencoder model). Large unlabeled datasets are available to cybersecurity vendors and the cost of their manual labeling by experts is high – this makes unsupervised learning valuable for threat detection. Clustering can help with optimizing efforts for the manual labeling of new samples. With informative embedding, we can decrease the number of labeled objects needed for the usage of the next machine learning approach in our pipeline: supervised learning.