AITopics | Fan, Yuchun

Collaborating Authors

Fan, Yuchun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced Optimization

Huang, Lei, Feng, Xiaocheng, Ma, Weitao, Fan, Yuchun, Feng, Xiachong, Ye, Yangfan, Zhong, Weihong, Gu, Yuxuan, Wang, Baoxin, Wu, Dayong, Hu, Guoping, Qin, Bing

arXiv.org Artificial IntelligenceJan-23-2025

Ensuring contextual faithfulness in retrieval-augmented large language models (LLMs) is crucial for building trustworthy information-seeking systems, particularly in long-form question-answering (LFQA) scenarios. In this work, we identify a salient correlation between LFQA faithfulness and retrieval heads, a set of attention heads responsible for retrieving contextual information. Leveraging this insight, we propose RHIO, a framework designed to teach LLMs to explicitly discriminate between faithful and unfaithful generations. RHIO first augments unfaithful samples that simulate realistic model-intrinsic errors by selectively masking retrieval heads. Then, these samples are incorporated into joint training, enabling the model to distinguish unfaithful outputs from faithful ones conditioned on control tokens. Furthermore, these control tokens are leveraged to self-induce contrastive outputs, amplifying their difference through contrastive decoding. Additionally, to facilitate the evaluation of contextual faithfulness, we also introduce GroundBench, a comprehensive benchmark compiled from five existing LFQA datasets. Extensive experimental results on GroundBench demonstrate that RHIO significantly improves faithfulness, even outperforming GPT-4o.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.13573

Country:

Asia > China (0.28)
North America > United States (0.28)
North America > Mexico > Mexico City (0.14)
(2 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment

Fan, Yuchun, Mu, Yongyu, Wang, Yilin, Huang, Lei, Ruan, Junhao, Li, Bei, Xiao, Tong, Huang, Shujian, Feng, Xiaocheng, Zhu, Jingbo

arXiv.org Artificial IntelligenceJan-7-2025

Despite the significant improvements achieved by large language models (LLMs) in English reasoning tasks, these models continue to struggle with multilingual reasoning. Recent studies leverage a full-parameter and two-stage training paradigm to teach models to first understand non-English questions and then reason. However, this method suffers from both substantial computational resource computing and catastrophic forgetting. The fundamental cause is that, with the primary goal of enhancing multilingual comprehension, an excessive number of irrelevant layers and parameters are tuned during the first stage. Given our findings that the representation learning of languages is merely conducted in lower-level layers, we propose an efficient multilingual reasoning alignment approach that precisely identifies and fine-tunes the layers responsible for handling multilingualism. Experimental results show that our method, SLAM, only tunes 6 layers' feed-forward sub-layers including 6.5-8% of all parameters within 7B and 13B LLMs, achieving superior average performance than all strong baselines across 10 languages. Meanwhile, SLAM only involves one training stage, reducing training time by 4.1-11.9 compared to the two-stage method.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.03681

Country:

North America (0.93)
Europe > Austria > Vienna (0.14)
Asia > China > Liaoning Province (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Advancing Large Language Model Attribution through Self-Improving

Huang, Lei, Feng, Xiaocheng, Ma, Weitao, Zhao, Liang, Fan, Yuchun, Zhong, Weihong, Xu, Dongliang, Yang, Qing, Liu, Hongtao, Qin, Bing

arXiv.org Artificial IntelligenceOct-17-2024

Teaching large language models (LLMs) to generate text with citations to evidence sources can mitigate hallucinations and enhance verifiability in information-seeking systems. However, improving this capability requires high-quality attribution data, which is costly and labor-intensive. Inspired by recent advances in self-improvement that enhance LLMs without manual annotation, we present START, a Self-Taught AttRibuTion framework for iteratively improving the attribution capability of LLMs. First, to prevent models from stagnating due to initially insufficient supervision signals, START leverages the model to self-construct synthetic training data for warming up. To further self-improve the model's attribution ability, START iteratively utilizes fine-grained preference supervision signals constructed from its sampled responses to encourage robust, comprehensive, and attributable generation. Experiments on three open-domain question-answering datasets, covering long-form QA and multi-step reasoning, demonstrate significant performance gains of 25.13% on average without relying on human annotations and more advanced models. Further analysis reveals that START excels in aggregating information across multiple sources.

attribution, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.13298

Country:

North America > United States (0.93)
Asia > China (0.68)

Genre: Research Report (1.00)

Industry:

Media (0.47)
Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Augmenting Large Language Model Translators via Translation Memories

Mu, Yongyu, Reheman, Abudurexiti, Cao, Zhiquan, Fan, Yuchun, Li, Bei, Li, Yinqiao, Xiao, Tong, Zhang, Chunliang, Zhu, Jingbo

arXiv.org Artificial IntelligenceMay-27-2023

Using translation memories (TMs) as prompts is a promising approach to in-context learning of machine translation models. In this work, we take a step towards prompting large language models (LLMs) with TMs and making them better translators. We find that the ability of LLMs to ``understand'' prompts is indeed helpful for making better use of TMs. Experiments show that the results of a pre-trained LLM translator can be greatly improved by using high-quality TM-based prompts. These results are even comparable to those of the state-of-the-art NMT systems which have access to large-scale in-domain bilingual data and are well tuned on the downstream tasks.

artificial intelligence, natural language, translation, (15 more...)

arXiv.org Artificial Intelligence

2305.17367

Country:

Europe (0.93)
North America > United States > Louisiana (0.14)
Asia > Middle East > Republic of Türkiye (0.14)
(2 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback