Li, Dingcheng
Backtracking for Safety
Sel, Bilgehan, Li, Dingcheng, Wallis, Phillip, Keshava, Vaishakh, Jin, Ming, Jonnalagadda, Siddhartha Reddy
Large language models (LLMs) have demonstrated remarkable capabilities across various tasks, but ensuring their safety and alignment with human values remains crucial. Current safety alignment methods, such as supervised fine-tuning and reinforcement learning-based approaches, can exhibit vulnerabilities to adversarial attacks and often result in shallow safety alignment, primarily focusing on preventing harmful content in the initial tokens of the generated output. While methods like resetting can help recover from unsafe generations by discarding previous tokens and restarting the generation process, they are not well-suited for addressing nuanced safety violations like toxicity that may arise within otherwise benign and lengthy generations. In this paper, we propose a novel backtracking method designed to address these limitations. Our method allows the model to revert to a safer generation state, not necessarily at the beginning, when safety violations occur during generation. This approach enables targeted correction of problematic segments without discarding the entire generated text, thereby preserving efficiency. We demonstrate that our method dramatically reduces toxicity appearing through the generation process with minimal impact to efficiency.
LSEBMCL: A Latent Space Energy-Based Model for Continual Learning
Li, Xiaodi, Li, Dingcheng, Gao, Rujun, Zamani, Mahmoud, Khan, Latifur
Continual learning has become essential in many practical applications such as online news summaries and product classification. The primary challenge is known as catastrophic forgetting, a phenomenon where a model inadvertently discards previously learned knowledge when it is trained on new tasks. Existing solutions involve storing exemplars from previous classes, regularizing parameters during the fine-tuning process, or assigning different model parameters to each task. The proposed solution LSEBMCL (Latent Space Energy-Based Model for Continual Learning) in this work is to use energy-based models (EBMs) to prevent catastrophic forgetting by sampling data points from previous tasks when training on new ones. The EBM is a machine learning model that associates an energy value with each input data point. The proposed method uses an EBM layer as an outer-generator in the continual learning framework for NLP tasks. The study demonstrates the efficacy of EBM in NLP tasks, achieving state-of-the-art results in all experiments.
Word Embedding with Neural Probabilistic Prior
Ren, Shaogang, Li, Dingcheng, Li, Ping
Pre-trained word embedding models can effectively integrate the learned prior knowledge and the information To improve word representation learning, we propose a probabilistic from the specific tasks in hand [34, 9, 44, 36]. These models prior which can be seamlessly integrated with word usually are capable of capturing the word token order information embedding models. Different from previous methods, word among the large number of sentences from a corpus embedding is taken as a probabilistic generative model, and by leveraging recurrent neural networks [16] and/or attention it enables us to impose a prior regularizing word representation mechanism [43]. Training of pre-trained models comes learning. The proposed prior not only enhances the with high costs such as large training corpora, long computation representation of embedding vectors but also improves the hours, and financial costs. Those may also reduce the model's robustness and stability. The structure of the proposed models' flexibility in application scenarios, e.g., when the prior is simple and effective, and it can be easily implemented training corpus or dataset is small [7].
A Tale of Two Latent Flows: Learning Latent Space Normalizing Flow with Short-run Langevin Flow for Approximate Inference
Xie, Jianwen, Zhu, Yaxuan, Xu, Yifei, Li, Dingcheng, Li, Ping
We study a normalizing flow in the latent space of a top-down generator model, in which the normalizing flow model plays the role of the informative prior model of the generator. We propose to jointly learn the latent space normalizing flow prior model and the top-down generator model by a Markov chain Monte Carlo (MCMC)-based maximum likelihood algorithm, where a short-run Langevin sampling from the intractable posterior distribution is performed to infer the latent variables for each observed example, so that the parameters of the normalizing flow prior and the generator can be updated with the inferred latent variables. We show that, under the scenario of non-convergent short-run MCMC, the finite step Langevin dynamics is a flow-like approximate inference model and the learning objective actually follows the perturbation of the maximum likelihood estimation (MLE). We further point out that the learning framework seeks to (i) match the latent space normalizing flow and the aggregated posterior produced by the short-run Langevin flow, and (ii) bias the model from MLE such that the short-run Langevin flow inference is close to the true posterior. Empirical results of extensive experiments validate the effectiveness of the proposed latent space normalizing flow model in the tasks of image generation, image reconstruction, anomaly detection, supervised image inpainting and unsupervised image recovery.