AITopics | Yu, Pengfei

Collaborating Authors

Yu, Pengfei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

Zhang, Yuji, Li, Sha, Qian, Cheng, Liu, Jiateng, Yu, Pengfei, Han, Chi, Fung, Yi R., McKeown, Kathleen, Zhai, Chengxiang, Li, Manling, Ji, Heng

arXiv.org Artificial IntelligenceFeb-22-2025

Hallucination is a persistent challenge in large language models (LLMs), where even with rigorous quality control, models often generate distorted facts. This paradox, in which error generation continues despite high-quality training data, calls for a deeper understanding of the underlying LLM mechanisms. To address it, we propose a novel concept: knowledge overshadowing, where model's dominant knowledge can obscure less prominent knowledge during text generation, causing the model to fabricate inaccurate details. Building on this idea, we introduce a novel framework to quantify factual hallucinations by modeling knowledge overshadowing. Central to our approach is the log-linear law, which predicts that the rate of factual hallucination increases linearly with the logarithmic scale of (1) Knowledge Popularity, (2) Knowledge Length, and (3) Model Size. The law provides a means to preemptively quantify hallucinations, offering foresight into their occurrence even before model training or inference. Built on overshadowing effect, we propose a new decoding strategy CoDa, to mitigate hallucinations, which notably enhance model factuality on Overshadow (27.9%), MemoTrap (13.1%) and NQ-Swap (18.3%). Our findings not only deepen understandings of the underlying mechanisms behind hallucinations but also provide actionable insights for developing more predictable and controllable language models.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.16143

Country:

Asia (0.93)
North America > United States > Illinois (0.14)
Africa > Middle East > Egypt (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

RPGBENCH: Evaluating Large Language Models as Role-Playing Game Engines

Yu, Pengfei, Shen, Dongming, Meng, Silin, Lee, Jaewon, Yin, Weisu, Cui, Andrea Yaoyun, Xu, Zhenlin, Zhu, Yi, Shi, Xingjian, Li, Mu, Smola, Alex

arXiv.org Artificial IntelligenceFeb-1-2025

We present RPGBench, the first benchmark designed to evaluate large language models (LLMs) as text-based role-playing game (RPG) engines. RPGBench comprises two core tasks: Game Creation (GC) and Game Simulation (GS). In GC, an LLM must craft a valid and playable RPG world using a structured event-state representation, ensuring logical coherence and proper termination conditions. In GS, the LLM simulates interactive gameplay across multiple rounds while consistently updating states and enforcing game rules. To comprehensively assess performance, RPGBench integrates objective and subjective evaluation methodologies. Objective measures verify adherence to event mechanics and check variable updates without requiring human intervention. Subjective measures, such as content interestingness, action quality, and role-playing capability, are evaluated via an LLM-as-a-judge framework, where a strong LLM grades each candidate's outputs. Empirical results demonstrate that state-of-the-art LLMs can produce engaging stories but often struggle to implement consistent, verifiable game mechanics, particularly in long or complex scenarios. By combining structured, rule-based assessments with LLM-based judgments, RPGBench provides a new standard for evaluating how well LLMs can balance creativity, coherence, and complexity in text-based RPGs, opening avenues for more immersive and controllable interactive storytelling.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.00595

Country:

Asia (0.28)
North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Gene-Metabolite Association Prediction with Interactive Knowledge Transfer Enhanced Graph for Metabolite Production

Xin, Kexuan, Wang, Qingyun, Chen, Junyu, Yu, Pengfei, Zhao, Huimin, Ji, Heng

arXiv.org Artificial IntelligenceOct-31-2024

In the rapidly evolving field of metabolic engineering, the quest for efficient and precise gene target identification for metabolite production enhancement presents significant challenges. Traditional approaches, whether knowledge-based or model-based, are notably time-consuming and labor-intensive, due to the vast scale of research literature and the approximation nature of genome-scale metabolic model (GEM) simulations. Therefore, we propose a new task, Gene-Metabolite Association Prediction based on metabolic graphs, to automate the process of candidate gene discovery for a given pair of metabolite and candidate-associated genes, as well as presenting the first benchmark containing 2474 metabolites and 1947 genes of two commonly used microorganisms Saccharomyces cerevisiae (SC) and Issatchenkia orientalis (IO). This task is challenging due to the incompleteness of the metabolic graphs and the heterogeneity among distinct metabolisms. To overcome these limitations, we propose an Interactive Knowledge Transfer mechanism based on Metabolism Graph (IKT4Meta), which improves the association prediction accuracy by integrating the knowledge from different metabolism graphs. First, to build a bridge between two graphs for knowledge transfer, we utilize Pretrained Language Models (PLMs) with external knowledge of genes and metabolites to help generate inter-graph links, significantly alleviating the impact of heterogeneity. Second, we propagate intra-graph links from different metabolic graphs using inter-graph links as anchors. Finally, we conduct the gene-metabolite association prediction based on the enriched metabolism graphs, which integrate the knowledge from multiple microorganisms. Experiments on both types of organisms demonstrate that our proposed methodology outperforms baselines by up to 12.3% across various link prediction frameworks.

data mining, knowledge management, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2410.18475

Country: North America > United States (0.68)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

Zhang, Yuji, Li, Sha, Liu, Jiateng, Yu, Pengfei, Fung, Yi R., Li, Jing, Li, Manling, Ji, Heng

arXiv.org Artificial IntelligenceJul-10-2024

Hallucination is often regarded as a major impediment for using large language models (LLMs), especially for knowledge-intensive tasks. Even when the training corpus consists solely of true statements, language models still generate hallucinations in the form of amalgamations of multiple facts. We coin this phenomenon as ``knowledge overshadowing'': when we query knowledge from a language model with multiple conditions, some conditions overshadow others, leading to hallucinated outputs. This phenomenon partially stems from training data imbalance, which we verify on both pretrained models and fine-tuned models, over a wide range of LM model families and sizes.From a theoretical point of view, knowledge overshadowing can be interpreted as over-generalization of the dominant conditions (patterns). We show that the hallucination rate grows with both the imbalance ratio (between the popular and unpopular condition) and the length of dominant condition description, consistent with our derived generalization bound. Finally, we propose to utilize overshadowing conditions as a signal to catch hallucination before it is produced, along with a training-free self-contrastive decoding method to alleviate hallucination during inference. Our proposed approach showcases up to 82% F1 for hallucination anticipation and 11.2% to 39.4% hallucination control, with different models and datasets.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2407.08039

Country:

Asia (1.00)
North America > United States > Illinois (0.14)
Africa > Middle East > Egypt (0.14)

Genre:

Research Report (1.00)
Personal > Honors (0.93)

Industry: Media (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Why Does New Knowledge Create Messy Ripple Effects in LLMs?

Qin, Jiaxin, Zhang, Zixuan, Han, Chi, Li, Manling, Yu, Pengfei, Ji, Heng

arXiv.org Artificial IntelligenceJul-2-2024

Extensive previous research has focused on post-training knowledge editing (KE) for language models (LMs) to ensure that knowledge remains accurate and up-to-date. One desired property and open question in KE is to let edited LMs correctly handle ripple effects, where LM is expected to answer its logically related knowledge accurately. In this paper, we answer the question of why most KE methods still create messy ripple effects. We conduct extensive analysis and identify a salient indicator, GradSim, that effectively reveals when and why updated knowledge ripples in LMs. GradSim is computed by the cosine similarity between gradients of the original fact and its related knowledge. We observe a strong positive correlation between ripple effect performance and GradSim across different LMs, KE methods, and evaluation metrics. Further investigations into three counter-intuitive failure cases (Negation, Over-Ripple, Multi-Lingual) of ripple effects demonstrate that these failures are often associated with very low GradSim. This finding validates that GradSim is an effective indicator of when knowledge ripples in LMs.

correlation, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2407.12828

Country:

Asia (0.30)
North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

Wang, Zihao, Li, Shuyu, Zhang, Tao, Wang, Qi, Yu, Pengfei, Luo, Jinyang, Liu, Yan, Xi, Ming, Zhang, Kejun

arXiv.org Artificial IntelligenceFeb-15-2024

The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music description datasets cannot serve as benchmarks. To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music. We established the Caichong Music Annotation Platform (CaiMAP) that employs an innovative multi-person, multi-stage assurance method, and recruited both amateurs and professionals to ensure the precision of annotations and alignment with popular semantics. Utilizing this method, we built a dataset with multi-dimensional, high-precision music annotations, the Caichong Music Dataset (CaiMD), and carefully selected 1,000 high-quality entries to serve as the test set for MuChin. Based on MuChin, we analyzed the discrepancies between professionals and amateurs in terms of music description, and empirically demonstrated the effectiveness of annotated data for fine-tuning LLMs. Ultimately, we employed MuChin to evaluate existing music understanding models on their ability to provide colloquial descriptions of music. All data related to the benchmark and the code for scoring have been open-sourced.

annotation, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.09871

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

PET Tracer Conversion among Brain PET via Variable Augmented Invertible Network

Shen, Bohui, Zhang, Wei, Liu, Xubiao, Yu, Pengfei, Jiang, Shirui, Shi, Xinchong, Zhang, Xiangsong, Zhou, Xiaoyu, Zhang, Weirui, Li, Bingxuan, Liu, Qiegen

arXiv.org Artificial IntelligenceNov-15-2023

Positron emission tomography (PET) serves as an essential tool for diagnosis of encephalopathy and brain science research. However, it suffers from the limited choice of tracers. Nowadays, with the wide application of PET imaging in neuropsychiatric treatment, 6-18F-fluoro-3, 4-dihydroxy-L-phenylalanine (DOPA) has been found to be more effective than 18F-labeled fluorine-2-deoxyglucose (FDG) in the field. Nevertheless, due to the complexity of its preparation and other limitations, DOPA is far less widely used than FDG. To address this issue, a tracer conversion invertible neural network (TC-INN) for image projection is developed to map FDG images to DOPA images through deep learning. More diagnostic information is obtained by generating PET images from FDG to DOPA. Specifically, the proposed TC-INN consists of two separate phases, one for training traceable data, the other for rebuilding new data. The reference DOPA PET image is used as a learning target for the corresponding network during the training process of tracer conversion. Meanwhile, the invertible network iteratively estimates the resultant DOPA PET data and compares it to the reference DOPA PET data. Notably, the reversible model employs variable enhancement technique to achieve better power generation. Moreover, image registration needs to be performed before training due to the angular deviation of the acquired FDG and DOPA data information. Experimental results exhibited excellent generation capability in mapping between FDG and DOPA, suggesting that PET tracer conversion has great potential in the case of limited tracer applications.

artificial intelligence, machine learning, pet image, (20 more...)

arXiv.org Artificial Intelligence

2311.00735

Country: Asia > China > Guangdong Province (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Defining a New NLP Playground

Li, Sha, Han, Chi, Yu, Pengfei, Edwards, Carl, Li, Manling, Wang, Xingyao, Fung, Yi R., Yu, Charles, Tetreault, Joel R., Hovy, Eduard H., Ji, Heng

arXiv.org Artificial IntelligenceOct-31-2023

The recent explosion of performance of large language models (LLMs) has changed the field of Natural Language Processing (NLP) more abruptly and seismically than any other shift in the field's 80-year history. This has resulted in concerns that the field will become homogenized and resource-intensive. The new status quo has put many academic researchers, especially PhD students, at a disadvantage. This paper aims to define a new NLP playground by proposing 20+ PhD-dissertation-worthy research directions, covering theoretical analysis, new and challenging problems, learning paradigms, and interdisciplinary applications.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2310.20633

Country:

Europe (1.00)
Asia > Middle East (0.68)
North America > United States > Minnesota (0.28)
North America > United States > Washington > King County > Seattle (0.28)

Genre:

Overview (0.68)
Research Report > Promising Solution (0.46)

Industry:

Health & Medicine (1.00)
Education (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)

Add feedback

RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought

Xue, Tianci, Wang, Ziqi, Wang, Zhenhailong, Han, Chi, Yu, Pengfei, Ji, Heng

arXiv.org Artificial IntelligenceOct-1-2023

Large language Models (LLMs) have achieved promising performance on arithmetic reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting. However, LLMs face challenges in maintaining factual consistency during reasoning, exhibiting tendencies to condition overlooking, question misinterpretation, and condition hallucination over given problems. Existing methods use coarse-grained feedback (e.g., whether the answer is correct) to improve factual consistency. In this work, we propose RCoT (Reversing Chain-of-Thought), a novel method to improve LLMs' reasoning abilities by automatically detecting and rectifying factual inconsistency in LLMs, generated solutions. To detect factual inconsistency, RCoT first asks LLMs to reconstruct the problem based on generated solutions. Then fine-grained comparisons between the original problem and the reconstructed problem expose the factual inconsistency in the original solutions. To rectify the solution, RCoT formulates detected factual inconsistency into fine-grained feedback to guide LLMs in revising solutions. Experimental results demonstrate improvements of RCoT over standard CoT, Self-Consistency and Self-Refine across seven arithmetic datasets. Moreover, we find that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities (e.g., ChatGPT reaches 94.6% accuracy on GSM8K), encouraging the community to further explore the fine-grained feedback generation methods.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.11499

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (0.87)
Research Report > Promising Solution (0.55)

Industry: Leisure & Entertainment > Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Self Information Update for Large Language Models through Mitigating Exposure Bias

Yu, Pengfei, Ji, Heng

arXiv.org Artificial IntelligenceMay-29-2023

Current LLMs have demonstrated remarkable capabilities in addressing users' requests for various types of information. However, these models are limited by the most recent data available in their pretraining corpora, rendering them incapable of providing up-to-date information. Retraining LLMs from scratch is cost-prohibitive, and the effectiveness of continual fine-tuning on new corpora has not been thoroughly examined. Additionally, current update procedures typically demand significant human input to prepare the information into more structured format, such as knowledge triples, conversational data or responses with human feedback. In this study, we conduct a comprehensive examination of a novel self information update task in LLMs, which only requires the provision of informative text corpora. For instance, we can use the latest news articles to update the LLMs' existing knowledge. We define the self information update task and assess the continual fine-tuning approach for this purpose. We observe that the naive method of continual fine-tuning can be problematic due to LLMs' exposure bias, which prioritizes existing information over new information we aim to integrate and leads to incorrect reasoning chains that ultimately diminish the efficacy of information updates. Based on our analysis, we propose an effective method to mitigate exposure bias by incorporating the selection of relevant facts into training losses. Furthermore, we develop a dataset to evaluate information updates, derived from news articles published after March 2023. Experimental results demonstrate that our proposed approach significantly increases the factual consistency score (0 to 1) by 0.16 while having minimal impact on performance for instructions not directly related to the new information.

artificial intelligence, information, natural language, (13 more...)

arXiv.org Artificial Intelligence

2305.18582

Country:

North America > United States > Illinois (0.14)
North America > United States > Kentucky (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Media > Television (0.69)
Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback