Goto

Collaborating Authors

 Jiang, Jonathan


Multimodal Techniques for Malware Classification

arXiv.org Artificial Intelligence

The threat of malware is a serious concern for computer networks and systems, highlighting the need for accurate classification techniques. In this research, we experiment with multimodal machine learning approaches for malware classification, based on the structured nature of the Windows Portable Executable (PE) file format. Specifically, we train Support Vector Machine (SVM), Long Short-Term Memory (LSTM), and Convolutional Neural Network (CNN) models on features extracted from PE headers, we train these same models on features extracted from the other sections of PE files, and train each model on features extracted from the entire PE file. We then train SVM models on each of the nine header-sections combinations of these baseline models, using the output layer probabilities of the component models as feature vectors. We compare the baseline cases to these multimodal combinations. In our experiments, we find that the best of the multimodal models outperforms the best of the baseline cases, indicating that it can be advantageous to train separate models on distinct parts of Windows PE files.


Multi-Label Learning on Tensor Product Graph

AAAI Conferences

A large family of graph-based semi-supervised algorithms have been developed intuitively and pragmatically for the multi-label learning problem. These methods, however, only implicitly exploited the label correlation, as either part of graph weight or an additional constraint, to improve overall classification performance. Despite their seemingly quite different formulations, we show that all existing approaches can be uniformly referred to as a Label Propagation (LP) or Random Walk with Restart (RWR) on a Cartesian Product Graph (CPG). Inspired by this discovery, we introduce a new framework for multi-label classification task, employing the Tensor Product Graph (TPG) — the tensor product of the data graph with the class (label) graph — in which not only the intra-class but also the inter-class associations are explicitly represented as weighted edges among graph vertices. In stead of computing directly on TPG, we derive an iterative algorithm, which is guaranteed to converge and with the same computational complexity and the same amount of storage as the standard label propagation on the original data graph. Applications to four benchmark multi-label data sets illustrate that our method outperforms several state-of-the-art approaches.