A Hybrid Deep Learning and Anomaly Detection Framework for Real-Time Malicious URL Classification

Khaled, Berkani, Rafik, Zeraoulia

arXiv.org Artificial Intelligence 

The number and sophistication of cyberthreats have increased along with the internet's exponential expansion, especially those that are spread by bad URLs. A variety of assaults, such as phishing, drive-by downloads, command-and-control communications, and data exfiltration, are launched using malicious websites. Because attackers are constantly changing URLs to avoid detection, traditional blacklisting techniques are unable to keep up with the dynamic and hostile character of contemporary threats. As a result, intelligent algorithms that can recognize intricate patterns in URLs and instantly identify malicious ones have become crucial components of contemporary cybersecurity protection designs [1, 13]. Because machine learning (ML) and deep learning (DL) approaches can identify non-linear relationships in input data and generalize from observed patterns, they have shown considerable promise in the field of malicious URL detection [2, 3]. But there are still a number of obstacles to overcome: class imbalance (lack of labeled malicious data compared to benign URLs); attackers' adversarial techniques that produce highly obfuscated or anomalous URLs that undermine the effectiveness of traditional classifiers; and the majority of detection systems are restricted to monolingual user interfaces and lack real-time usability features.