AITopics | Li, Dingcheng

Collaborating Authors

Li, Dingcheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Backtracking for Safety

Sel, Bilgehan, Li, Dingcheng, Wallis, Phillip, Keshava, Vaishakh, Jin, Ming, Jonnalagadda, Siddhartha Reddy

arXiv.org Artificial IntelligenceMar-11-2025

Large language models (LLMs) have demonstrated remarkable capabilities across various tasks, but ensuring their safety and alignment with human values remains crucial. Current safety alignment methods, such as supervised fine-tuning and reinforcement learning-based approaches, can exhibit vulnerabilities to adversarial attacks and often result in shallow safety alignment, primarily focusing on preventing harmful content in the initial tokens of the generated output. While methods like resetting can help recover from unsafe generations by discarding previous tokens and restarting the generation process, they are not well-suited for addressing nuanced safety violations like toxicity that may arise within otherwise benign and lengthy generations. In this paper, we propose a novel backtracking method designed to address these limitations. Our method allows the model to revert to a safer generation state, not necessarily at the beginning, when safety violations occur during generation. This approach enables targeted correction of problematic segments without discarding the entire generated text, thereby preserving efficiency. We demonstrate that our method dramatically reduces toxicity appearing through the generation process with minimal impact to efficiency.

large language model, machine learning, preprint arxiv, (18 more...)

arXiv.org Artificial Intelligence

2503.08919

Country:

Asia (0.46)
North America > United States (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.48)
Education > Educational Setting (0.46)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LSEBMCL: A Latent Space Energy-Based Model for Continual Learning

Li, Xiaodi, Li, Dingcheng, Gao, Rujun, Zamani, Mahmoud, Khan, Latifur

arXiv.org Artificial IntelligenceJan-9-2025

Continual learning has become essential in many practical applications such as online news summaries and product classification. The primary challenge is known as catastrophic forgetting, a phenomenon where a model inadvertently discards previously learned knowledge when it is trained on new tasks. Existing solutions involve storing exemplars from previous classes, regularizing parameters during the fine-tuning process, or assigning different model parameters to each task. The proposed solution LSEBMCL (Latent Space Energy-Based Model for Continual Learning) in this work is to use energy-based models (EBMs) to prevent catastrophic forgetting by sampling data points from previous tasks when training on new ones. The EBM is a machine learning model that associates an energy value with each input data point. The proposed method uses an EBM layer as an outer-generator in the continual learning framework for NLP tasks. The study demonstrates the efficacy of EBM in NLP tasks, achieving state-of-the-art results in all experiments.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2501.05495

Country: North America > United States (0.29)

Genre: Research Report (1.00)

Industry: Media > News (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Word Embedding with Neural Probabilistic Prior

Ren, Shaogang, Li, Dingcheng, Li, Ping

arXiv.org Artificial IntelligenceSep-21-2023

Pre-trained word embedding models can effectively integrate the learned prior knowledge and the information To improve word representation learning, we propose a probabilistic from the specific tasks in hand [34, 9, 44, 36]. These models prior which can be seamlessly integrated with word usually are capable of capturing the word token order information embedding models. Different from previous methods, word among the large number of sentences from a corpus embedding is taken as a probabilistic generative model, and by leveraging recurrent neural networks [16] and/or attention it enables us to impose a prior regularizing word representation mechanism [43]. Training of pre-trained models comes learning. The proposed prior not only enhances the with high costs such as large training corpora, long computation representation of embedding vectors but also improves the hours, and financial costs. Those may also reduce the model's robustness and stability. The structure of the proposed models' flexibility in application scenarios, e.g., when the prior is simple and effective, and it can be easily implemented training corpus or dataset is small [7].

information, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2309.11824

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California (0.14)
North America > Canada > Alberta (0.14)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

A Tale of Two Latent Flows: Learning Latent Space Normalizing Flow with Short-run Langevin Flow for Approximate Inference

Xie, Jianwen, Zhu, Yaxuan, Xu, Yifei, Li, Dingcheng, Li, Ping

arXiv.org Artificial IntelligenceJan-23-2023

We study a normalizing flow in the latent space of a top-down generator model, in which the normalizing flow model plays the role of the informative prior model of the generator. We propose to jointly learn the latent space normalizing flow prior model and the top-down generator model by a Markov chain Monte Carlo (MCMC)-based maximum likelihood algorithm, where a short-run Langevin sampling from the intractable posterior distribution is performed to infer the latent variables for each observed example, so that the parameters of the normalizing flow prior and the generator can be updated with the inferred latent variables. We show that, under the scenario of non-convergent short-run MCMC, the finite step Langevin dynamics is a flow-like approximate inference model and the learning objective actually follows the perturbation of the maximum likelihood estimation (MLE). We further point out that the learning framework seeks to (i) match the latent space normalizing flow and the aggregated posterior produced by the short-run Langevin flow, and (ii) bias the model from MLE such that the short-run Langevin flow inference is close to the true posterior. Empirical results of extensive experiments validate the effectiveness of the proposed latent space normalizing flow model in the tasks of image generation, image reconstruction, anomaly detection, supervised image inpainting and unsupervised image recovery.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2301.093

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback