Not enough data to create a plot.
Try a different view from the menu above.
Cui, Yufei
VLM-C4L: Continual Core Dataset Learning with Corner Case Optimization via Vision-Language Models for Autonomous Driving
Hu, Haibo, Zuo, Jiacheng, Lou, Yang, Cui, Yufei, Wang, Jianping, Guan, Nan, Wang, Jin, Li, Yung-Hui, Xue, Chun Jason
With the widespread adoption and deployment of autonomous driving, handling complex environments has become an unavoidable challenge. Due to the scarcity and diversity of extreme scenario datasets, current autonomous driving models struggle to effectively manage corner cases. This limitation poses a significant safety risk, according to the National Highway Traffic Safety Administration (NHTSA), autonomous vehicle systems have been involved in hundreds of reported crashes annually in the United States, occurred in corner cases like sun glare and fog, which caused a few fatal accident. Furthermore, in order to consistently maintain a robust and reliable autonomous driving system, it is essential for models not only to perform well on routine scenarios but also to adapt to newly emerging scenarios, especially those corner cases that deviate from the norm. This requires a learning mechanism that incrementally integrates new knowledge without degrading previously acquired capabilities. However, to the best of our knowledge, no existing continual learning methods have been proposed to ensure consistent and scalable corner case learning in autonomous driving. To address these limitations, we propose VLM-C4L, a continual learning framework that introduces Vision-Language Models (VLMs) to dynamically optimize and enhance corner case datasets, and VLM-C4L combines VLM-guided high-quality data extraction with a core data replay strategy, enabling the model to incrementally learn from diverse corner cases while preserving performance on previously routine scenarios, thus ensuring long-term stability and adaptability in real-world autonomous driving. We evaluate VLM-C4L on large-scale real-world autonomous driving datasets, including Waymo and the corner case dataset CODA.
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
Wang, Xinyu, Ma, Linrui, Huang, Jerry, Lu, Peng, Parthasarathi, Prasanna, Chang, Xiao-Wen, Chen, Boxing, Cui, Yufei
Recent shifts in the space of large language model (LLM) research have shown an increasing focus on novel architectures to compete with prototypical Transformer-based models that have long dominated this space. Linear recurrent models have proven to be a viable competitor due to their computational efficiency. However, such models still demonstrate a sizable gap compared to Transformers in terms of in-context learning among other tasks that require recalling information from a context. In this work, we introduce __Resona__, a simple and scalable framework for augmenting linear recurrent models with retrieval. __Resona__~augments models with the ability to integrate retrieved information from the provided input context, enabling tailored behavior to diverse task requirements. Experiments on a variety of linear recurrent models demonstrate that __Resona__-augmented models observe significant performance gains on a variety of synthetic as well as real-world natural language tasks, highlighting its ability to act as a general purpose method to improve the in-context learning and language modeling abilities of linear recurrent LLMs.
RALAD: Bridging the Real-to-Sim Domain Gap in Autonomous Driving with Retrieval-Augmented Learning
Zuo, Jiacheng, Hu, Haibo, Zhou, Zikang, Cui, Yufei, Liu, Ziquan, Wang, Jianping, Guan, Nan, Wang, Jin, Xue, Chun Jason
In the pursuit of robust autonomous driving systems, models trained on real-world datasets often struggle to adapt to new environments, particularly when confronted with corner cases such as extreme weather conditions. Collecting these corner cases in the real world is non-trivial, which necessitates the use of simulators for validation. However,the high computational cost and the domain gap in data distribution have hindered the seamless transition between real and simulated driving scenarios. To tackle this challenge, we propose Retrieval-Augmented Learning for Autonomous Driving (RALAD), a novel framework designed to bridge the real-to-sim gap at a low cost. RALAD features three primary designs, including (1) domain adaptation via an enhanced Optimal Transport (OT) method that accounts for both individual and grouped image distances, (2) a simple and unified framework that can be applied to various models, and (3) efficient fine-tuning techniques that freeze the computationally expensive layers while maintaining robustness. Experimental results demonstrate that RALAD compensates for the performance degradation in simulated environments while maintaining accuracy in real-world scenarios across three different models. Taking Cross View as an example, the mIOU and mAP metrics in real-world scenarios remain stable before and after RALAD fine-tuning, while in simulated environments,the mIOU and mAP metrics are improved by 10.30% and 12.29%, respectively. Moreover, the re-training cost of our approach is reduced by approximately 88.1%. Our code is available at https://github.com/JiachengZuo/RALAD.git.
Retrieval-Augmented Generation for Natural Language Processing: A Survey
Wu, Shangyu, Xiong, Ying, Cui, Yufei, Wu, Haolun, Chen, Can, Yuan, Ye, Huang, Lianming, Liu, Xue, Kuo, Tei-Wei, Guan, Nan, Xue, Chun Jason
Large language models (LLMs) have demonstrated great success in various fields, benefiting from their huge amount of parameters that store knowledge. However, LLMs still suffer from several key issues, such as hallucination problems, knowledge update issues, and lacking domain-specific expertise. The appearance of retrieval-augmented generation (RAG), which leverages an external knowledge database to augment LLMs, makes up those drawbacks of LLMs. This paper reviews all significant techniques of RAG, especially in the retriever and the retrieval fusions. Besides, tutorial codes are provided for implementing the representative techniques in RAG. This paper further discusses the RAG training, including RAG with/without datastore update. Then, we introduce the application of RAG in representative natural language processing tasks and industrial scenarios. Finally, this paper discusses the future directions and challenges of RAG for promoting its development.
RAEE: A Training-Free Retrieval-Augmented Early Exiting Framework for Efficient Inference
Huang, Lianming, Wu, Shangyu, Cui, Yufei, Xiong, Ying, Liu, Xue, Kuo, Tei-Wei, Guan, Nan, Xue, Chun Jason
Deploying large language model inference remains challenging due to their high computational overhead. Early exiting accelerates model inference by adaptively reducing the number of inference layers. Existing methods require training internal classifiers to determine whether to exit at each intermediate layer. However, such classifier-based early exiting frameworks require significant effort to design and train the classifiers. To address these limitations, this paper proposes RAEE, a training-free Retrieval-Augmented Early Exiting framework for efficient inference. First, this paper demonstrates that the early exiting problem can be modeled as a distribution prediction problem, where the distribution is approximated using similar data's existing information. Next, the paper details the process of collecting existing information to build the retrieval database. Finally, based on the pre-built retrieval database, RAEE leverages the retrieved similar data's exiting information to guide the backbone model to exit at the layer, which is predicted by the approximated distribution. Experimental results demonstrate that the proposed RAEE can significantly accelerate inference. RAEE also achieves state-of-the-art zero-shot performance on 8 classification tasks.
Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities
Zhou, Hao, Hu, Chengming, Yuan, Ye, Cui, Yufei, Jin, Yili, Chen, Can, Wu, Haolun, Yuan, Dun, Jiang, Li, Wu, Di, Liu, Xue, Zhang, Charlie, Wang, Xianbin, Liu, Jiangchuan
Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks based on human instructions, paving the way to artificial general intelligence (AGI)-enabled 6G. Given the great potential of LLM technologies, this work aims to provide a comprehensive overview of LLM-enabled telecom networks. In particular, we first present LLM fundamentals, including model architecture, pre-training, fine-tuning, inference and utilization, model evaluation, and telecom deployment. Then, we introduce LLM-enabled key techniques and telecom applications in terms of generation, classification, optimization, and prediction problems. Specifically, the LLM-enabled generation applications include telecom domain knowledge, code, and network configuration generation. After that, the LLM-based classification applications involve network security, text, image, and traffic classification problems. Moreover, multiple LLM-enabled optimization techniques are introduced, such as automated reward function design for reinforcement learning and verbal reinforcement learning. Furthermore, for LLM-aided prediction problems, we discussed time-series prediction models and multi-modality prediction problems for telecom. Finally, we highlight the challenges and identify the future directions of LLM-enabled telecom networks.
The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks
Liu, Ziquan, Cui, Yufei, Yan, Yan, Xu, Yi, Ji, Xiangyang, Liu, Xue, Chan, Antoni B.
In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient health and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness through various forms of adversarial training (AT), a notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models. To address this gap, this study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks within the adversarial defense community. It is first unveiled that existing CP methods do not produce informative prediction sets under the commonly used $l_{\infty}$-norm bounded attack if the model is not adversarially trained, which underpins the importance of adversarial training for CP. Our paper next demonstrates that the prediction set size (PSS) of CP using adversarially trained models with AT variants is often worse than using standard AT, inspiring us to research into CP-efficient AT for improved PSS. We propose to optimize a Beta-weighting loss with an entropy minimization regularizer during AT to improve CP-efficiency, where the Beta-weighting loss is shown to be an upper bound of PSS at the population level by our theoretical analysis. Moreover, our empirical study on four image classification datasets across three popular AT baselines validates the effectiveness of the proposed Uncertainty-Reducing AT (AT-UR).
Pre-processing matters: A segment search method for WSI classification
Wang, Jun, Cui, Yufei, Mao, Yu, Guan, Nan, Xue, Chun Jason
Pre-processing for whole slide images can affect classification performance both in the training and inference stages. Our study analyzes the impact of pre-processing parameters on inference and training across single- and multiple-domain datasets. However, searching for an optimal parameter set is time-consuming. To overcome this, we propose a novel Similarity-based Simulated Annealing approach for fast parameter tuning to enhance inference performance on single-domain data. Our method demonstrates significant performance improvements in accuracy, which raise accuracy from 0.512 to 0.847 in a single domain. We further extend our insight into training performance in multi-domain data by employing a novel Bayesian optimization to search optimal pre-processing parameters, resulting in a high AUC of 0.967. We highlight that better pre-processing for WSI can contribute to further accuracy improvement in the histology area.
Improving Natural Language Understanding with Computation-Efficient Retrieval Representation Fusion
Wu, Shangyu, Xiong, Ying, Cui, Yufei, Liu, Xue, Tang, Buzhou, Kuo, Tei-Wei, Xue, Chun Jason
Retrieval-based augmentations that aim to incorporate knowledge from an external database into language models have achieved great success in various knowledge-intensive (KI) tasks, such as question-answering and text generation. However, integrating retrievals in non-knowledge-intensive (NKI) tasks, such as text classification, is still challenging. Existing works focus on concatenating retrievals to inputs as context to form the prompt-based inputs. Unfortunately, such methods require language models to have the capability to handle long texts. Besides, inferring such concatenated data would also consume a significant amount of computational resources. To solve these challenges, we propose \textbf{ReFusion} in this paper, a computation-efficient \textbf{Re}trieval representation \textbf{Fusion} with neural architecture search. The main idea is to directly fuse the retrieval representations into the language models. Specifically, we first propose an online retrieval module that retrieves representations of similar sentences. Then, we present a retrieval fusion module including two effective ranking schemes, i.e., reranker-based scheme and ordered-mask-based scheme, to fuse the retrieval representations with hidden states. Furthermore, we use Neural Architecture Search (NAS) to seek the optimal fusion structure across different layers. Finally, we conduct comprehensive experiments, and the results demonstrate our ReFusion can achieve superior and robust performance on various NKI tasks.
The Implicit Biases of Stochastic Gradient Descent on Deep Neural Networks with Batch Normalization
Liu, Ziquan, Cui, Yufei, Wan, Jia, Mao, Yu, Chan, Antoni B.
Deep neural networks with batch normalization (BN-DNNs) are invariant to weight rescaling due to their normalization operations. However, using weight decay (WD) benefits these weight-scale-invariant networks, which is often attributed to an increase of the effective learning rate when the weight norms are decreased. In this paper, we demonstrate the insufficiency of the previous explanation and investigate the implicit biases of stochastic gradient descent (SGD) on BN-DNNs to provide a theoretical explanation for the efficacy of weight decay. We identity two implicit biases of SGD on BN-DNNs: 1) the weight norms in SGD training remain constant in the continuous-time domain and keep increasing in the discrete-time domain; 2) SGD optimizes weight vectors in fully-connected networks or convolution kernels in convolution neural networks by updating components lying in the input feature span, while leaving those components orthogonal to the input feature span unchanged. Thus, SGD without WD accumulates weight noise orthogonal to the input feature span, and cannot eliminate such noise. Our empirical studies corroborate the hypothesis that weight decay suppresses weight noise that is left untouched by SGD. Furthermore, we propose to use weight rescaling (WRS) instead of weight decay to achieve the same regularization effect, while avoiding performance degradation of WD on some momentum-based optimizers. Our empirical results on image recognition show that regardless of optimization methods and network architectures, training BN-DNNs using WRS achieves similar or better performance compared with using WD. We also show that training with WRS generalizes better compared to WD, on other computer vision tasks.