Discovering Human and Machine Readable Descriptions of Malware Families

Anderson, Blake (Cisco Systems, Inc.) | McGrew, David (Cisco Systems, Inc.) | Paul, Subharthi (Cisco Systems, Inc.)

Apr-12-2016–AAAI Conferences

While an immense amount of work has gone into novel clustering algorithms, little work has focused on developing compact, domain-specific explanations for the results of the clustering algorithms. Attaching semantic meaning to a cluster has numerous benefits, including the ability for such a description to be both human and machine readable. In this paper, we assume that the clusters are given to us, and find the minimal set of features that can differentiate one cluster from the remaining set of samples. We formulate this problem as an integer linear program. By using samples not belonging to the cluster in the optimization formulation, the resulting description will be minimal and contain no false positives. The efficacy of this method is demonstrated on simulation data and real-world malware data run in a sandbox that collects behavioral characteristics. In the case of malware, once it has been clustered, it would have been sent to a reverse engineer who would have been tasked with creating the actual meaning of the clustering results and disseminating this information through signatures or indicators of compromise. This is a time-consuming process that can take hours to weeks depending on the complexity of the malware family. The methods presented in this paper automatically generate optimal signatures, which can then be quickly propagated to help contain the spread of a malware family.

artificial intelligence, data mining, machine learning, (17 more...)

AAAI Conferences

Apr-12-2016

Conferences PDF

Add feedback

Country:
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Data Science > Data Mining (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning
      - Performance Analysis > Accuracy (0.72)
      - Statistical Learning > Clustering (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found