AITopics | Yang, Xiaoniu

Collaborating Authors

Yang, Xiaoniu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reassessing Layer Pruning in LLMs: New Insights and Methods

Lu, Yao, Cheng, Hao, Fang, Yujie, Wang, Zeyu, Wei, Jiaheng, Xu, Dongwei, Xuan, Qi, Yang, Xiaoniu, Zhu, Zhaowei

arXiv.org Artificial IntelligenceNov-23-2024

Although large language models (LLMs) have achieved remarkable success across various domains, their considerable scale necessitates substantial computational resources, posing significant challenges for deployment in resource-constrained environments. Layer pruning, as a simple yet effective compression method, removes layers of a model directly, reducing computational overhead. However, what are the best practices for layer pruning in LLMs? Are sophisticated layer selection metrics truly effective? Does the LoRA (Low-Rank Approximation) family, widely regarded as a leading method for pruned model fine-tuning, truly meet expectations when applied to post-pruning fine-tuning? To answer these questions, we dedicate thousands of GPU hours to benchmarking layer pruning in LLMs and gaining insights across multiple dimensions. Our results demonstrate that a simple approach, i.e., pruning the final 25\% of layers followed by fine-tuning the \texttt{lm\_head} and the remaining last three layer, yields remarkably strong performance. Following this guide, we prune Llama-3.1-8B-It and obtain a model that outperforms many popular LLMs of similar size, such as ChatGLM2-6B, Vicuna-7B-v1.5, Qwen1.5-7B and Baichuan2-7B. We release the optimal model weights on Huggingface, and the code is available on GitHub.

large language model, natural language, reassessing layer pruning, (3 more...)

arXiv.org Artificial Intelligence

2411.15558

Genre: Research Report > New Finding (0.53)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

RedTest: Towards Measuring Redundancy in Deep Neural Networks Effectively

Lu, Yao, Zhang, Peixin, Wang, Jingyi, Ma, Lei, Yang, Xiaoniu, Xuan, Qi

arXiv.org Artificial IntelligenceNov-15-2024

Deep learning has revolutionized computing in many real-world applications, arguably due to its remarkable performance and extreme convenience as an end-to-end solution. However, deep learning models can be costly to train and to use, especially for those large-scale models, making it necessary to optimize the original overly complicated models into smaller ones in scenarios with limited resources such as mobile applications or simply for resource saving. The key question in such model optimization is, how can we effectively identify and measure the redundancy in a deep learning model structure. While several common metrics exist in the popular model optimization techniques to measure the performance of models after optimization, they are not able to quantitatively inform the degree of remaining redundancy. To address the problem, we present a novel testing approach, i.e., RedTest, which proposes a novel testing metric called Model Structural Redundancy Score (MSRS) to quantitatively measure the degree of redundancy in a deep learning model structure. We first show that MSRS is effective in both revealing and assessing the redundancy issues in many state-of-the-art models, which urgently calls for model optimization. Then, we utilize MSRS to assist deep learning model developers in two practical application scenarios: 1) in Neural Architecture Search, we design a novel redundancy-aware algorithm to guide the search for the optimal model structure and demonstrate its effectiveness by comparing it to existing standard NAS practice; 2) in the pruning of large-scale pre-trained models, we prune the redundant layers of pre-trained models with the guidance of layer similarity to derive less redundant ones of much smaller size. Extensive experimental results demonstrate that removing such redundancy has a negligible effect on the model utility.

artificial intelligence, machine learning, redundancy, (20 more...)

arXiv.org Artificial Intelligence

2411.10507

Country:

North America > United States (1.00)
Asia (0.67)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lateral Movement Detection via Time-aware Subgraph Classification on Authentication Logs

Zhou, Jiajun, Yao, Jiacheng, Chen, Xuanze, Yu, Shanqing, Xuan, Qi, Yang, Xiaoniu

arXiv.org Artificial IntelligenceNov-15-2024

Lateral movement is a crucial component of advanced persistent threat (APT) attacks in networks. Attackers exploit security vulnerabilities in internal networks or IoT devices, expanding their control after initial infiltration to steal sensitive data or carry out other malicious activities, posing a serious threat to system security. Existing research suggests that attackers generally employ seemingly unrelated operations to mask their malicious intentions, thereby evading existing lateral movement detection methods and hiding their intrusion traces. In this regard, we analyze host authentication log data from a graph perspective and propose a multi-scale lateral movement detection framework called LMDetect. The main workflow of this framework proceeds as follows: 1) Construct a heterogeneous multigraph from host authentication log data to strengthen the correlations among internal system entities; 2) Design a time-aware subgraph generator to extract subgraphs centered on authentication events from the heterogeneous authentication multigraph; 3) Design a multi-scale attention encoder that leverages both local and global attention to capture hidden anomalous behavior patterns in the authentication subgraphs, thereby achieving lateral movement detection. Extensive experiments on two real-world authentication log datasets demonstrate the effectiveness and superiority of our framework in detecting lateral movement behaviors.

data mining, detection, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.10279

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(2 more...)

Add feedback

Mixing Signals: Data Augmentation Approach for Deep Learning Based Modulation Recognition

Xu, Xinjie, Chen, Zhuangzhi, Xu, Dongwei, Zhou, Huaji, Yu, Shanqing, Zheng, Shilian, Xuan, Qi, Yang, Xiaoniu

arXiv.org Artificial IntelligenceOct-29-2024

With the rapid development of deep learning, automatic modulation recognition (AMR), as an important task in cognitive radio, has gradually transformed from traditional feature extraction and classification to automatic classification by deep learning technology. However, deep learning models are data-driven methods, which often require a large amount of data as the training support. Data augmentation, as the strategy of expanding dataset, can improve the generalization of the deep learning models and thus improve the accuracy of the models to a certain extent. In this paper, for AMR of radio signals, we propose a data augmentation strategy based on mixing signals and consider four specific methods (Random Mixing, Maximum-Similarity-Mixing, $\theta-$Similarity Mixing and n-times Random Mixing) to achieve data augmentation. Experiments show that our proposed method can improve the classification accuracy of deep learning based AMR models in the full public dataset RML2016.10a. In particular, for the case of a single signal-to-noise ratio signal set, the classification accuracy can be significantly improved, which verifies the effectiveness of the methods.

data augmentation approach, deep learning, machine learning, (3 more...)

arXiv.org Artificial Intelligence

2204.03737

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Rethinking Graph Transformer Architecture Design for Node Classification

Zhou, Jiajun, Chen, Xuanze, Xie, Chenxuan, Shanqing, Yu, Xuan, Qi, Yang, Xiaoniu

arXiv.org Artificial IntelligenceOct-14-2024

Graph Transformer (GT), as a special type of Graph Neural Networks (GNNs), utilizes multi-head attention to facilitate high-order message passing. However, this also imposes several limitations in node classification applications: 1) nodes are susceptible to global noise; 2) self-attention computation cannot scale well to large graphs. In this work, we conduct extensive observational experiments to explore the adaptability of the GT architecture in node classification tasks and draw several conclusions: the current multi-head self-attention module in GT can be completely replaceable, while the feed-forward neural network module proves to be valuable. Based on this, we decouple the propagation (P) and transformation (T) of GNNs and explore a powerful GT architecture, named GNNFormer, which is based on the P/T combination message passing and adapted for node classification in both homophilous and heterophilous scenarios. Extensive experiments on 12 benchmark datasets demonstrate that our proposed GT architecture can effectively adapt to node classification tasks without being affected by global noise and computational efficiency limitations.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.11189

Genre: Research Report (0.40)

Industry: Information Technology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Clarify Confused Nodes Through Separated Learning

Zhou, Jiajun, Gong, Shengbo, Xie, Chenxuan, Yu, Shanqing, Xuan, Qi, Yang, Xiaoniu

arXiv.org Artificial IntelligenceJan-9-2024

Graph neural networks (GNNs) have achieved remarkable advances in graph-oriented tasks. However, real-world graphs invariably contain a certain proportion of heterophilous nodes, challenging the homophily assumption of classical GNNs and hindering their performance. Most existing studies continue to design generic models with shared weights between heterophilous and homophilous nodes. Despite the incorporation of high-order messages or multi-channel architectures, these efforts often fall short. A minority of studies attempt to train different node groups separately but suffer from inappropriate separation metrics and low efficiency. In this paper, we first propose a new metric, termed Neighborhood Confusion (NC), to facilitate a more reliable separation of nodes. We observe that node groups with different levels of NC values exhibit certain differences in intra-group accuracy and visualized embeddings. These pave the way for Neighborhood Confusion-guided Graph Convolutional Network (NCGCN), in which nodes are grouped by their NC values and accept intra-group weight sharing and message passing. Extensive experiments on both homophilous and heterophilous benchmarks demonstrate that our framework can effectively separate nodes and yield significant performance improvement compared to the latest methods. The source code will be released soon.

data mining, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

2306.02285

Country:

North America > United States > California (0.14)
Asia > China > Jiangxi Province (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.93)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Deep Image Semantic Communication Model for Artificial Intelligent Internet of Things

Qian, Li Ping, Zhang, Yi, Lyu, Sikai, Zhu, Huijie, Wu, Yuan, Shen, Xuemin Sherman, Yang, Xiaoniu

arXiv.org Artificial IntelligenceNov-8-2023

With the rapid development of Artificial Intelligent Internet of Things (AIoT), the image data from AIoT devices has been witnessing the explosive increasing. In this paper, a novel deep image semantic communication model is proposed for the efficient image communication in AIoT. Particularly, at the transmitter side, a high-precision image semantic segmentation algorithm is proposed to extract the semantic information of the image to achieve significant compression of the image data. At the receiver side, a semantic image restoration algorithm based on Generative Adversarial Network (GAN) is proposed to convert the semantic image to a real scene image with detailed information. Simulation results demonstrate that the proposed image semantic communication model can improve the image compression ratio and recovery accuracy by 71.93% and 25.07% on average in comparison with WebP and CycleGAN, respectively. More importantly, our demo experiment shows that the proposed model reduces the total delay by 95.26% in the image communication, when comparing with the original image transmission.

artificial intelligence, deep image semantic communication model, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2311.02926

Genre: Research Report (0.69)

Industry: Information Technology > Smart Houses & Appliances (0.60)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

Deep Learning-Based Frequency Offset Estimation

Chen, Tao, Zheng, Shilian, Zhu, Jiawei, Xuan, Qi, Yang, Xiaoniu

arXiv.org Artificial IntelligenceNov-8-2023

In wireless communication systems, the asynchronization of the oscillators in the transmitter and the receiver along with the Doppler shift due to relative movement may lead to the presence of carrier frequency offset (CFO) in the received signals. Estimation of CFO is crucial for subsequent processing such as coherent demodulation. In this brief, we demonstrate the utilization of deep learning for CFO estimation by employing a residual network (ResNet) to learn and extract signal features from the raw in-phase (I) and quadrature (Q) components of the signals. We use multiple modulation schemes in the training set to make the trained model adaptable to multiple modulations or even new signals. In comparison to the commonly used traditional CFO estimation methods, our proposed IQ-ResNet method exhibits superior performance across various scenarios including different oversampling ratios, various signal lengths, and different channels

artificial intelligence, estimation, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2311.16155

Country: Asia > China (0.15)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Augmenting Radio Signals with Wavelet Transform for Deep Learning-Based Modulation Recognition

Chen, Tao, Zheng, Shilian, Qiu, Kunfeng, Zhang, Luxin, Xuan, Qi, Yang, Xiaoniu

arXiv.org Artificial IntelligenceNov-7-2023

The use of deep learning for radio modulation recognition has become prevalent in recent years. This approach automatically extracts high-dimensional features from large datasets, facilitating the accurate classification of modulation schemes. However, in real-world scenarios, it may not be feasible to gather sufficient training data in advance. Data augmentation is a method used to increase the diversity and quantity of training dataset and to reduce data sparsity and imbalance. In this paper, we propose data augmentation methods that involve replacing detail coefficients decomposed by discrete wavelet transform for reconstructing to generate new samples and expand the training set. Different generation methods are used to generate replacement sequences. Simulation results indicate that our proposed methods significantly outperform the other augmentation methods.

artificial intelligence, data quality, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2311.03761

Country:

Asia > China (0.14)
North America > United States (0.14)
Asia > India (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Quality > Data Transformation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AIR: Threats of Adversarial Attacks on Deep Learning-Based Information Recovery

Chen, Jinyin, Ge, Jie, Zheng, Shilian, Ye, Linhui, Zheng, Haibin, Shen, Weiguo, Yue, Keqiang, Yang, Xiaoniu

arXiv.org Artificial IntelligenceAug-17-2023

A wireless communications system usually consists of a transmitter which transmits the information and a receiver which recovers the original information from the received distorted signal. Deep learning (DL) has been used to improve the performance of the receiver in complicated channel environments and state-of-the-art (SOTA) performance has been achieved. However, its robustness has not been investigated. In order to evaluate the robustness of DL-based information recovery models under adversarial circumstances, we investigate adversarial attacks on the SOTA DL-based information recovery model, i.e., DeepReceiver. We formulate the problem as an optimization problem with power and peak-to-average power ratio (PAPR) constraints. We design different adversarial attack methods according to the adversary's knowledge of DeepReceiver's model and/or testing samples. Extensive experiments show that the DeepReceiver is vulnerable to the designed attack methods in all of the considered scenarios. Even in the scenario of both model and test sample restricted, the adversary can attack the DeepReceiver and increase its bit error rate (BER) above 10%. It can also be found that the DeepReceiver is vulnerable to adversarial perturbations even with very low power and limited PAPR. These results suggest that defense measures should be taken to enhance the robustness of DeepReceiver.

artificial intelligence, deep learning-based information recovery, machine learning, (3 more...)

arXiv.org Artificial Intelligence

2309.16706

Genre: Research Report (0.69)

Industry:

Information Technology > Security & Privacy (0.80)
Government > Military (0.80)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback