Goto

Collaborating Authors

Comparison of Deep Learning and the Classical Machine Learning Algorithm for the Malware Detection

arXiv.org Artificial Intelligence

Recently, Deep Learning has been showing promising results in various Artificial Intelligence applications like image recognition, natural language processing, language modeling, neural machine translation, etc. Although, in general, it is computationally more expensive as compared to classical machine learning techniques, their results are found to be more effective in some cases. Therefore, in this paper, we investigated and compared one of the Deep Learning Architecture called Deep Neural Network (DNN) with the classical Random Forest (RF) machine learning algorithm for the malware classification. We studied the performance of the classical RF and DNN with 2, 4 & 7 layers architectures with the four different feature sets, and found that irrespective of the features inputs, the classical RF accuracy outperforms the DNN.


An effective approach for classification of advanced malware with high accuracy

arXiv.org Artificial Intelligence

Combating malware is very important for software/systems security, but to prevent the software/systems from the advanced malware, viz. metamorphic malware is a challenging task, as it changes the structure/code after each infection. Therefore in this paper, we present a novel approach to detect the advanced malware with high accuracy by analyzing the occurrence of opcodes (features) by grouping the executables. These groups are made on the basis of our earlier studies [1] that the difference between the sizes of any two malware generated by popular advanced malware kits viz. PS-MPC, G2 and NGVCK are within 5 KB. On the basis of obtained promising features, we studied the performance of thirteen classifiers using N-fold cross-validation available in machine learning tool WEKA. Among these thirteen classifiers we studied in-depth top five classifiers (Random forest, LMT, NBT, J48 and FT) and obtain more than 96.28% accuracy for the detection of unknown malware, which is better than the maximum detection accuracy (95.9%) reported by Santos et al (2013). In these top five classifiers, our approach obtained a detection accuracy of 97.95% by the Random forest.


Feds warn against Hidden Cobra's Hoplight malware SC Media

#artificialintelligence

A consortium of U.S. federal agencies released a notification on Hoplight, a new data collector malware being used by the North Korean cyberespionage group Hidden Cobra (aka Lazuras). The Department of Homeland Security, FBI, and Department of Defense in its malware analysis report on Hoplight noted it obfuscation plays a large role in the malware's behavior containing 20 malicious executable files, 16 of which are designed to mask activity between the malware and the operator. "When executed the malware will collect system information about the victim machine including OS Version, Volume Information, and System Time, as well as enumerate the system drives and partitions," the report states. The malware is extremely sophisticated and uses proxies to generate fake TLS handshake sessions using valid public SSL certificates, so the network connection is effectively disguised. Two versions of Hoplight exist "So if the opcode for Keepalive in version 1 is 0xB6C1, the opcode in version 2 will be 0xB6C2," the report stated.


DRLDO: A novel DRL based De-ObfuscationSystem for Defense against Metamorphic Malware

arXiv.org Artificial Intelligence

In this paper, we propose a novel mechanism to normalize metamorphic and obfuscated malware down at the opcode level and hence create an advanced metamorphic malware de-obfuscation and defense system. We name this system DRLDO, for Deep Reinforcement Learning based De-Obfuscator. With the inclusion of the DRLDO as a sub-component, an existing Intrusion Detection System could be augmented with defensive capabilities against 'zero-day' attacks from obfuscated and metamorphic variants of existing malware. This gains importance, not only because there exists no system to date that uses advanced DRL to intelligently and automatically normalize obfuscation down even to the opcode level, but also because the DRLDO system does not mandate any changes to the existing IDS. The DRLDO system does not even mandate the IDS' classifier to be retrained with any new dataset containing obfuscated samples. Hence DRLDO could be easily retrofitted into any existing IDS deployment. We designed, developed, and conducted experiments on the system to evaluate the same against multiple-simultaneous attacks from obfuscations generated from malware samples from a standardized dataset that contains multiple generations of malware. Experimental results prove that DRLDO was able to successfully make the otherwise un-detectable obfuscated variants of the malware detectable by an existing pre-trained malware classifier. The detection probability was raised well above the cut-off mark to 0.6 for the classifier to detect the obfuscated malware unambiguously. Further, the de-obfuscated variants generated by DRLDO achieved a very high correlation (of 0.99) with the base malware. This observation validates that the DRLDO system is actually learning to de-obfuscate and not exploiting a trivial trick.


An investigation of a deep learning based malware detection system

arXiv.org Artificial Intelligence

We investigate a Deep Learning based system for malware detection. In the investigation, we experiment with different combination of Deep Learning architectures including Auto-Encoders, and Deep Neural Networks with varying layers over Malicia malware dataset on which earlier studies have obtained an accuracy of (98%) with an acceptable False Positive Rates (1.07%). But these results were done using extensive man-made custom domain features and investing corresponding feature engineering and design efforts. In our proposed approach, besides improving the previous best results (99.21% accuracy and a False Positive Rate of 0.19%) indicates that Deep Learning based systems could deliver an effective defense against malware. Since it is good in automatically extracting higher conceptual features from the data, Deep Learning based systems could provide an effective, general and scalable mechanism for detection of existing and unknown malware.