Edmonton
A Transformer-Based Framework for Payload Malware Detection and Classification
Stein, Kyle, Mahyari, Arash, Francia, Guillermo III, El-Sheikh, Eman
As malicious cyber threats become more sophisticated in breaching computer networks, the need for effective intrusion detection systems (IDSs) becomes crucial. Techniques such as Deep Packet Inspection (DPI) have been introduced to allow IDSs analyze the content of network packets, providing more context for identifying potential threats. IDSs traditionally rely on using anomaly-based and signature-based detection techniques to detect unrecognized and suspicious activity. Deep learning techniques have shown great potential in DPI for IDSs due to their efficiency in learning intricate patterns from the packet content being transmitted through the network. In this paper, we propose a revolutionary DPI algorithm based on transformers adapted for the purpose of detecting malicious traffic with a classifier head. Transformers learn the complex content of sequence data and generalize them well to similar scenarios thanks to their self-attention mechanism. Our proposed method uses the raw payload bytes that represent the packet contents and is deployed as man-in-the-middle. The payload bytes are used to detect malicious packets and classify their types. Experimental results on the UNSW-NB15 and CIC-IOT23 datasets demonstrate that our transformer-based model is effective in distinguishing malicious from benign traffic in the test dataset, attaining an average accuracy of 79\% using binary classification and 72\% on the multi-classification experiment, both using solely payload bytes.
Few-shot Named Entity Recognition via Superposition Concept Discrimination
Chen, Jiawei, Lin, Hongyu, Han, Xianpei, Lu, Yaojie, Jiang, Shanshan, Dong, Bin, Sun, Le
Few-shot NER aims to identify entities of target types with only limited number of illustrative instances. Unfortunately, few-shot NER is severely challenged by the intrinsic precise generalization problem, i.e., it is hard to accurately determine the desired target type due to the ambiguity stemming from information deficiency. In this paper, we propose Superposition Concept Discriminator (SuperCD), which resolves the above challenge via an active learning paradigm. Specifically, a concept extractor is first introduced to identify superposition concepts from illustrative instances, with each concept corresponding to a possible generalization boundary. Then a superposition instance retriever is applied to retrieve corresponding instances of these superposition concepts from large-scale text corpus. Finally, annotators are asked to annotate the retrieved instances and these annotated instances together with original illustrative instances are used to learn FS-NER models. To this end, we learn a universal concept extractor and superposition instance retriever using a large-scale openly available knowledge bases. Experiments show that SuperCD can effectively identify superposition concepts from illustrative instances, retrieve superposition instances from large-scale corpus, and significantly improve the few-shot NER performance with minimal additional efforts.
Deep Learning Based Sphere Decoding
Mohammadkarimi, Mostafa, Mehrabi, Mehrtash, Ardakani, Masoud, Jing, Yindi
In this paper, a deep learning (DL)-based sphere decoding algorithm is proposed, where the radius of the decoding hypersphere is learned by a deep neural network (DNN). The performance achieved by the proposed algorithm is very close to the optimal maximum likelihood decoding (MLD) over a wide range of signal-to-noise ratios (SNRs), while the computational complexity, compared to existing sphere decoding variants, is significantly reduced. This improvement is attributed to DNN's ability of intelligently learning the radius of the hypersphere used in decoding. The expected complexity of the proposed DL-based algorithm is analytically derived and compared with existing ones. It is shown that the number of lattice points inside the decoding hypersphere drastically reduces in the DL-based algorithm in both the average and worst-case senses. The effectiveness of the proposed algorithm is shown through simulation for high-dimensional multiple-input multiple-output (MIMO) systems, using high-order modulations.
Distributed Robust Learning based Formation Control of Mobile Robots based on Bioinspired Neural Dynamics
Xu, Zhe, Yan, Tao, Yang, Simon X., Gadsden, S. Andrew, Biglarbegian, Mohammad
This paper addresses the challenges of distributed formation control in multiple mobile robots, introducing a novel approach that enhances real-world practicability. We first introduce a distributed estimator using a variable structure and cascaded design technique, eliminating the need for derivative information to improve the real time performance. Then, a kinematic tracking control method is developed utilizing a bioinspired neural dynamic-based approach aimed at providing smooth control inputs and effectively resolving the speed jump issue. Furthermore, to address the challenges for robots operating with completely unknown dynamics and disturbances, a learning-based robust dynamic controller is developed. This controller provides real time parameter estimates while maintaining its robustness against disturbances. The overall stability of the proposed method is proved with rigorous mathematical analysis. At last, multiple comprehensive simulation studies have shown the advantages and effectiveness of the proposed method.
BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion
Wei, Jia, Zhang, Xingjun, Pedrycz, Witold
Bagging has achieved great success in the field of machine learning by integrating multiple base classifiers to build a single strong classifier to reduce model variance. The performance improvement of bagging mainly relies on the number and diversity of base classifiers. However, traditional deep learning model training methods are expensive to train individually and difficult to train multiple models with low similarity in a restricted dataset. Recently, diffusion models, which have been tremendously successful in the fields of imaging and vision, have been found to be effective in generating neural network model weights and biases with diversity. We creatively propose a Bagging deep learning training algorithm based on Efficient Neural network Diffusion (BEND). The originality of BEND comes from the first use of a neural network diffusion model to efficiently build base classifiers for bagging. Our approach is simple but effective, first using multiple trained model weights and biases as inputs to train autoencoder and latent diffusion model to realize a diffusion model from noise to valid neural network parameters. Subsequently, we generate several base classifiers using the trained diffusion model. Finally, we integrate these ba se classifiers for various inference tasks using the Bagging method. Resulting experiments on multiple models and datasets show that our proposed BEND algorithm can consistently outperform the mean and median accuracies of both the original trained model and the diffused model. At the same time, new models diffused using the diffusion model have higher diversity and lower cost than multiple models trained using traditional methods. The BEND approach successfully introduces diffusion models into the new deep learning training domain and provides a new paradigm for future deep learning training and inference.
Research Re: search & Re-search
Search algorithms are often categorized by their node expansion strategy. One option is the depth-first strategy, a simple backtracking strategy that traverses the search space in the order in which successor nodes are generated. An alternative is the best-first strategy, which was designed to make it possible to use domain-specific heuristic information. By exploring promising parts of the search space first, best-first algorithms are usually more efficient than depth-first algorithms. In programs that play minimax games such as chess and checkers, the efficiency of the search is of crucial importance. Given the success of best-first algorithms in other domains, one would expect them to be used for minimax games too. However, all high-performance game-playing programs are based on a depth-first algorithm. This study takes a closer look at a depth-first algorithm, AB, and a best-first algorithm, SSS. The prevailing opinion on these algorithms is that SSS offers the potential for a more efficient search, but that its complicated formulation and exponential memory requirements render it impractical. The theoretical part of this work shows that there is a surprisingly straightforward link between the two algorithms -- for all practical purposes, SSS is a special case of AB. Subsequent empirical evidence proves the prevailing opinion on SSS to be wrong: it is not a complicated algorithm, it does not need too much memory, and it is also not more efficient than depth-first search.
Weisfeiler and Leman Go Loopy: A New Hierarchy for Graph Representational Learning
Paolino, Raffaele, Maskey, Sohir, Welke, Pascal, Kutyniok, Gitta
For example, in organic chemistry or bioinformatics, different types of cycles can impact We introduce r-loopy Weisfeiler-Leman (r-lWL), various chemical properties of the underlying molecules a novel hierarchy of graph isomorphism tests and (Deshpande et al., 2002; Koyutürk et al., 2004). Therefore, a corresponding GNN framework, r-lMPNN, that it is crucial to investigate whether GNNs can count certain can count cycles up to length r + 2. Most notably, substructures and to design architectures that surpass the we show that r-lWL can count homomorphisms limited power of MPNNs. of cactus graphs. This strictly extends classical Many models have been proposed to match or surpass the 1-WL, which can only count homomorphisms of expressive power of WL. Several draw inspiration from trees and, in fact, is incomparable to k-WL for any higher-order variants of the WL algorithm (Morris et al., fixed k. We empirically validate the expressive 2019), enabling them to count a broader range of substructures.
Relaxed Clipping: A Global Training Method for Robust Regression and Classification
Robust regression and classification are often thought to require non-convex loss functions that prevent scalable, global training. However, such a view neglects the possibility of reformulated training methods that can yield practically solvable alternatives. A natural way to make a loss function more robust to outliers is to truncate loss values that exceed a maximum threshold. We demonstrate that a relaxation of this form of "loss clipping" can be made globally solvable and applicable to any standard loss while guaranteeing robustness against outliers. We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems.
On Strategy Stitching in Large Extensive Form Multiplayer Games
Computing a good strategy in a large extensive form game often demands an extraordinary amount of computer memory, necessitating the use of abstraction to reduce the game size. Typically, strategies from abstract games perform better in the real game as the granularity of abstraction is increased. This paper investigates two techniques for stitching a base strategy in a coarse abstraction of the full game tree, to expert strategies in fine abstractions of smaller subtrees. We provide a general framework for creating static experts, an approach that generalizes some previous strategy stitching efforts. In addition, we show that static experts can create strong agents for both 2-player and 3-player Leduc and Limit Texas Hold'em poker, and that a specific class of static experts can be preferred among a number of alternatives. Furthermore, we describe a poker agent that used static experts and won the 3-player events of the 2010 Annual Computer Poker Competition.
Learning Patient-Specific Cancer Survival Distributions as a Sequence of Dependent Regressors
An accurate model of patient survival time can help in the treatment and care of cancer patients. The common practice of providing survival time estimates based only on population averages for the site and stage of cancer ignores many important individual differences among patients. In this paper, we propose a local regression method for learning patient-specific survival time distribution based on patient attributes such as blood tests and clinical assessments. When tested on a cohort of more than 2000 cancer patients, our method gives survival time predictions that are much more accurate than popular survival analysis models such as the Cox and Aalen regression models. Our results also show that using patient-specific attributes can reduce the prediction error on survival time by as much as 20% when compared to using cancer site and stage only.