AITopics | Hong, Yihuai

Plotting

Hong, Yihuai

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction

Hong, Yihuai, Zhou, Dian, Cao, Meng, Yu, Lei, Jin, Zhijing

arXiv.org Artificial IntelligenceMar-29-2025

Large language models (LLMs) excel on a variety of reasoning benchmarks, but previous studies suggest they sometimes struggle to generalize to unseen questions, potentially due to over-reliance on memorized training examples. However, the precise conditions under which LLMs switch between reasoning and memorization during text generation remain unclear. In this work, we provide a mechanistic understanding of LLMs' reasoning-memorization dynamics by identifying a set of linear features in the model's residual stream that govern the balance between genuine reasoning and memory recall. These features not only distinguish reasoning tasks from memory-intensive ones but can also be manipulated to causally influence model performance on reasoning tasks. Additionally, we show that intervening in these reasoning features helps the model more accurately activate the most relevant problem-solving capabilities during answer generation. Our findings offer new insights into the underlying mechanisms of reasoning and memory in LLMs and pave the way for the development of more robust and interpretable generative AI systems.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.23084

Country:

Asia (0.68)
North America > Canada (0.46)
Europe > Germany (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Dissecting Fine-Tuning Unlearning in Large Language Models

Hong, Yihuai, Zou, Yuelin, Hu, Lijie, Zeng, Ziqian, Wang, Di, Yang, Haiqin

arXiv.org Artificial IntelligenceOct-15-2024

Although earlier investigations (Hong et al., 2024; Lee et al., 2024a) have Consequently, of these fine-tuning-based unlearning methods recent research has focused on developing on LLaMA2-7B-chat (Touvron et al., 2023) and efficient unlearning methods as a post-training OLMo-7B (Groeneveld et al., 2024) by implementing technique to selectively unlearn the specific knowledge them on the respective pretraining datasets of (Blanco-Justicia et al., 2024; Liu et al., 2024). We discover that while these methods 2023; Jang et al., 2023; Yao et al., 2024; appear to effectively unlearn target knowledge, they Rafailov et al., 2023), with corresponding adjustments also inevitably affect the output and behavior related and designs in the loss function to facilitate to unrelated knowledge.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.06606

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.69)
Health & Medicine > Therapeutic Area > Endocrinology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces

Hong, Yihuai, Yu, Lei, Ravfogel, Shauli, Yang, Haiqin, Geva, Mor

arXiv.org Artificial IntelligenceJun-17-2024

The task of "unlearning" certain concepts in large language models (LLMs) has attracted immense attention recently, due to its importance for mitigating undesirable model behaviours, such as the generation of harmful, private, or incorrect information. Current protocols to evaluate unlearning methods largely rely on behavioral tests, without monitoring the presence of unlearned knowledge within the model's parameters. This residual knowledge can be adversarially exploited to recover the erased information post-unlearning. We argue that unlearning should also be evaluated internally, by considering changes in the parametric knowledge traces of the unlearned concepts. To this end, we propose a general methodology for eliciting directions in the parameter space (termed "concept vectors") that encode concrete concepts, and construct ConceptVectors, a benchmark dataset containing hundreds of common concepts and their parametric knowledge traces within two open-source LLMs. Evaluation on ConceptVectors shows that existing unlearning methods minimally impact concept vectors, while directly ablating these vectors demonstrably removes the associated knowledge from the LLMs and significantly reduces their susceptibility to adversarial manipulation. Our results highlight limitations in behavioral-based unlearning evaluations and call for future work to include parametric-based evaluations. To support this, we release our code and benchmark at https://github.com/yihuaihong/ConceptVectors.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2406.11614

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Law (0.93)
Leisure & Entertainment > Sports (0.68)
Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference

Zeng, Ziqian, Hong, Yihuai, Dai, Hongliang, Zhuang, Huiping, Chen, Cen

arXiv.org Artificial IntelligenceDec-19-2023

Early Exiting is one of the most popular methods to achieve efficient inference. Current early exiting methods adopt the (weighted) sum of the cross entropy loss of all internal classifiers during training, imposing all these classifiers to predict all instances correctly. However, during inference, as long as one internal classifier predicts an instance correctly, it can accelerate without losing accuracy. Thus, there is a notable gap between training and inference. We propose ConsistentEE, an early exiting method that is consistent in training and inference. ConsistentEE formulates the early exiting process as a reinforcement learning problem. A policy network is added to decide whether an instance should exit or continue. The training objective of ConsistentEE only require each instance to be predicted correctly by one internal classifier. Additionally, we introduce the concept Memorize Layer to measure the hardness of an instance. We incorporate memorized layer into reward function design, which allows ``easy'' instances to focus more on acceleration while ``hard'' instances to focus more on accuracy. Experimental results show that our method outperforms other baselines on various natural language understanding and generation tasks.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.11882

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback