Respond to Change with Constancy: Instruction-tuning with LLM for Non-I.I.D. Network Traffic Classification

Lin, Xinjie, Xiong, Gang, Gou, Gaopeng, Dong, Wenqi, Yu, Jing, Li, Zhen, Xia, Wei

arXiv.org Artificial Intelligence 

--Encrypted traffic classification is highly challenging in network security due to the need for extracting robust features from content-agnostic traffic data. Existing approaches face critical issues: (i) Distribution drift, caused by reliance on the closed-world assumption, limits adaptability to real-world, shifting patterns; (ii) Dependence on labeled data restricts applicability where such data is scarce or unavailable. Large language models (LLMs) have demonstrated remarkable potential in offering generalizable solutions across a wide range of tasks, achieving notable success in various specialized fields. However, their effectiveness in traffic analysis remains constrained by challenges in adapting to the unique requirements of the traffic domain. In this paper, we introduce a novel traffic representation model named Encrypted Traffic Out-of-Distribution Instruction T uning with LLM (ET ooL), which integrates LLMs with knowledge of traffic structures through a self-supervised instruction tuning paradigm. This framework establishes connections between textual information and traffic interactions. ET ooL demonstrates more robust classification performance and superior generalization in both supervised and zero-shot traffic classification tasks. Additionally, we construct NETD, a traffic dataset designed to support dynamic distributional shifts, and use it to validate ET ooL's effectiveness under varying distributional conditions. Furthermore, we evaluate the efficiency gains achieved through ET ooL's instruction tuning approach. Received 22 October 2024; revised 30 April 2025; accepted 20 May 2025. This work is supported by The National Key Research and Development Program of China No. 2024YFF1401300. Gang Xiong, Gaopeng Gou, Wenqi Dong, Jing Y u, Zhen Li and Wei Xia are with the Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100190, China, and also with the School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100085, China. In recent years, gradual full encryption of traffic has become a reality, explicit fingerprinting has been gradually failing. Different technical approaches have been proposed to address the needs of encrypted traffic analysis, including: ( i) Statistical feature-based approaches [15], [39] extract statistical features and combine them with classical machine learning algorithms to cope with traffic without plaintext; ( ii) Raw feature-based approaches [8], [22] on the other hand selects raw traffic features and captures complicated patterns based on deep learning algorithms; and ( iii) Raw datagram-based approaches [18]- [20] utilize deep neural networks to learn implicit correlations between datagram bytes.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found