AITopics | Feng, Xiaocheng

Plotting

Feng, Xiaocheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Advancing Large Language Model Attribution through Self-Improving

Huang, Lei, Feng, Xiaocheng, Ma, Weitao, Zhao, Liang, Fan, Yuchun, Zhong, Weihong, Xu, Dongliang, Yang, Qing, Liu, Hongtao, Qin, Bing

arXiv.org Artificial IntelligenceOct-17-2024

Teaching large language models (LLMs) to generate text with citations to evidence sources can mitigate hallucinations and enhance verifiability in information-seeking systems. However, improving this capability requires high-quality attribution data, which is costly and labor-intensive. Inspired by recent advances in self-improvement that enhance LLMs without manual annotation, we present START, a Self-Taught AttRibuTion framework for iteratively improving the attribution capability of LLMs. First, to prevent models from stagnating due to initially insufficient supervision signals, START leverages the model to self-construct synthetic training data for warming up. To further self-improve the model's attribution ability, START iteratively utilizes fine-grained preference supervision signals constructed from its sampled responses to encourage robust, comprehensive, and attributable generation. Experiments on three open-domain question-answering datasets, covering long-form QA and multi-step reasoning, demonstrate significant performance gains of 25.13% on average without relying on human annotations and more advanced models. Further analysis reveals that START excels in aggregating information across multiple sources.

attribution, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.13298

Country:

North America > United States (0.93)
Asia > China (0.68)

Genre: Research Report (1.00)

Industry:

Media (0.47)
Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GlobeSumm: A Challenging Benchmark Towards Unifying Multi-lingual, Cross-lingual and Multi-document News Summarization

Ye, Yangfan, Feng, Xiachong, Feng, Xiaocheng, Ma, Weitao, Qin, Libo, Xu, Dongliang, Yang, Qing, Liu, Hongtao, Qin, Bing

arXiv.org Artificial IntelligenceOct-5-2024

News summarization in today's global scene can be daunting with its flood of multilingual content and varied viewpoints from different sources. However, current studies often neglect such real-world scenarios as they tend to focus solely on either single-language or single-document tasks. To bridge this gap, we aim to unify Multi-lingual, Cross-lingual and Multi-document Summarization into a novel task, i.e., MCMS, which encapsulates the real-world requirements all-in-one. Nevertheless, the lack of a benchmark inhibits researchers from adequately studying this invaluable problem. To tackle this, we have meticulously constructed the GLOBESUMM dataset by first collecting a wealth of multilingual news reports and restructuring them into event-centric format. Additionally, we introduce the method of protocol-guided prompting for high-quality and cost-effective reference annotation. In MCMS, we also highlight the challenge of conflicts between news reports, in addition to the issues of redundancies and omissions, further enhancing the complexity of GLOBESUMM. Through extensive experimental analysis, we validate the quality of our dataset and elucidate the inherent challenges of the task. We firmly believe that GLOBESUMM, given its challenging nature, will greatly contribute to the multilingual communities and the evaluation of LLMs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.04087

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry: Government > Regional Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Extending Context Window of Large Language Models from a Distributional Perspective

Wu, Yingsheng, Gu, Yuxuan, Feng, Xiaocheng, Zhong, Weihong, Xu, Dongliang, Yang, Qing, Liu, Hongtao, Qin, Bing

arXiv.org Artificial IntelligenceOct-3-2024

Scaling the rotary position embedding (RoPE) has become a common method for extending the context window of RoPE-based large language models (LLMs). However, existing scaling methods often rely on empirical approaches and lack a profound understanding of the internal distribution within RoPE, resulting in suboptimal performance in extending the context window length. In this paper, we propose to optimize the context window extending task from the view of rotary angle distribution. Specifically, we first estimate the distribution of the rotary angles within the model and analyze the extent to which length extension perturbs this distribution. Then, we present a novel extension strategy that minimizes the disturbance between rotary angle distributions to maintain consistency with the pre-training phase, enhancing the model's capability to generalize to longer sequences. Experimental results compared to the strong baseline methods demonstrate that our approach reduces by up to 72% of the distributional disturbance when extending LLaMA2's context window to 8k, and reduces by up to 32% when extending to 16k. On the LongBench-E benchmark, our method achieves an average improvement of up to 4.33% over existing state-of-the-art methods. Furthermore, Our method maintains the model's performance on the Hugging Face Open LLM benchmark after context window extension, with only an average performance fluctuation ranging from -0.12 to +0.22.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.0149

Country:

North America > United States (0.28)
Asia > China (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report > Promising Solution (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models

Zhong, Weihong, Feng, Xiaocheng, Zhao, Liang, Li, Qiming, Huang, Lei, Gu, Yuxuan, Ma, Weitao, Xu, Yuan, Qin, Bing

arXiv.org Artificial IntelligenceJun-29-2024

Though advanced in understanding visual information with human languages, Large Vision-Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is that during multimodal interaction, the generated hallucinations could influence the LVLMs' subsequent generation. Thus, we raise a question: When presented with a query relevant to the previously generated hallucination, will LVLMs be misled and respond incorrectly, even though the ground visual information exists? To answer this, we propose a framework called MMHalSnowball to evaluate LVLMs' behaviors when encountering generated hallucinations, where LVLMs are required to answer specific visual questions within a curated hallucinatory conversation. Crucially, our experiment shows that the performance of open-source LVLMs drops by at least $31\%$, indicating that LVLMs are prone to accept the generated hallucinations and make false claims that they would not have supported without distractions. We term this phenomenon Multimodal Hallucination Snowballing. To mitigate this, we further propose a training-free method called Residual Visual Decoding, where we revise the output distribution of LVLMs with the one derived from the residual visual input, providing models with direct access to the visual information. Experiments show that our method can mitigate more than $24\%$ of the snowballed multimodal hallucination while maintaining capabilities.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2407.00569

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Tennis (0.93)
Transportation > Ground > Road (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Rethinking Entity-level Unlearning for Large Language Models

Ma, Weitao, Feng, Xiaocheng, Zhong, Weihong, Huang, Lei, Ye, Yangfan, Qin, Bing

arXiv.org Artificial IntelligenceJun-22-2024

Large language model unlearning has gained increasing attention due to its potential to mitigate security and privacy concerns. Current research predominantly focuses on Instance-level unlearning, specifically aiming at forgetting predefined instances of sensitive content. However, a notable gap still exists in exploring the deletion of complete entity-related information, which is crucial in many real-world scenarios, such as copyright protection. To this end, we propose a novel task of Entity-level unlearning, where the entity-related knowledge within the target model is supposed to be entirely erased. Given the challenge of practically accessing all entity-related knowledge within a model, we begin by simulating entity-level unlearning scenarios through fine-tuning models to introduce pseudo entities. Following this, we develop baseline methods inspired by trending unlearning techniques and conduct a detailed comparison of their effectiveness in this task. Extensive experiments reveal that current unlearning algorithms struggle to achieve effective entity-level unlearning. Additionally, our analyses further indicate that entity-related knowledge injected through fine-tuning is more susceptible than original entities from pre-training during unlearning, highlighting the necessity for more thorough pseudo-entity injection methods to make them closer to pre-trained knowledge.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.15796

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration

Huang, Yichong, Feng, Xiaocheng, Li, Baohang, Xiang, Yang, Wang, Hui, Qin, Bing, Liu, Ting

arXiv.org Artificial IntelligenceMay-30-2024

Large language models (LLMs) exhibit complementary strengths in various tasks, motivating the research of LLM ensembling. However, existing work focuses on training an extra reward model or fusion model to select or combine all candidate answers, posing a great challenge to the generalization on unseen data distributions. Besides, prior methods use textual responses as communication media, ignoring the valuable information in the internal representations. In this work, we propose a training-free ensemble framework DeePEn, fusing the informative probability distributions yielded by different LLMs at each decoding step. Unfortunately, the vocabulary discrepancy between heterogeneous LLMs directly makes averaging the distributions unfeasible due to the token misalignment. To address this challenge, DeePEn maps the probability distribution of each model from its own probability space to a universal relative space based on the relative representation theory, and performs aggregation. Next, we devise a search-based inverse transformation to transform the aggregated result back to the probability space of one of the ensembling LLMs (main model), in order to determine the next token. We conduct extensive experiments on ensembles of different number of LLMs, ensembles of LLMs with different architectures, and ensembles between the LLM and the specialist model. Experimental results show that (i) DeePEn achieves consistent improvements across six benchmarks covering subject examination, reasoning, and knowledge, (ii) a well-performing specialist model can benefit from a less effective LLM through distribution fusion, and (iii) DeePEn has complementary strengths with other ensemble methods such as voting.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.12715

Country:

Asia (0.67)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Relay Decoding: Concatenating Large Language Models for Machine Translation

Fu, Chengpeng, Feng, Xiaocheng, Huang, Yichong, Huo, Wenshuai, Li, Baohang, Wang, Hui, Qin, Bin, Liu, Ting

arXiv.org Artificial IntelligenceMay-5-2024

Leveraging large language models for machine translation has demonstrated promising results. However, it does require the large language models to possess the capability of handling both the source and target languages in machine translation. When it is challenging to find large models that support the desired languages, resorting to continuous learning methods becomes a costly endeavor. To mitigate these expenses, we propose an innovative approach called RD (Relay Decoding), which entails concatenating two distinct large models that individually support the source and target languages. By incorporating a simple mapping layer to facilitate the connection between these two models and utilizing a limited amount of parallel data for training, we successfully achieve superior results in the machine translation task. Experimental results conducted on the Multi30k and WikiMatrix datasets validate the effectiveness of our proposed method.

large language model, natural language, translation, (12 more...)

arXiv.org Artificial Intelligence

2405.02933

Country:

Europe (1.00)
Asia > China (0.29)

Genre: Research Report (1.00)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Aligning Translation-Specific Understanding to General Understanding in Large Language Models

Huang, Yichong, Feng, Xiaocheng, Li, Baohang, Fu, Chengpeng, Huo, Wenshuai, Liu, Ting, Qin, Bing

arXiv.org Artificial IntelligenceJan-10-2024

Although large language models (LLMs) have shown surprising language understanding and generation capabilities, they have yet to gain a revolutionary advancement in the field of machine translation. One potential cause of the limited performance is the misalignment between the translation-specific understanding and general understanding inside LLMs. To align the translation-specific understanding to the general one, we propose a novel translation process xIoD (Cross-Lingual Interpretation of Difficult words), explicitly incorporating the general understanding on the content incurring inconsistent understanding to guide the translation. Specifically, xIoD performs the cross-lingual interpretation for the difficult-to-translate words and enhances the translation with the generated interpretations. Furthermore, we reframe the external tools of QE to tackle the challenges of xIoD in the detection of difficult words and the generation of helpful interpretations. We conduct experiments on the self-constructed benchmark ChallengeMT, which includes cases in which multiple SOTA translation systems consistently underperform. Experimental results show the effectiveness of our xIoD, which improves up to +3.85 COMET.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2401.05072

Country:

North America > Canada (0.15)
North America > United States (0.14)
Europe > Finland (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Length Extrapolation of Transformers: A Survey from the Perspective of Position Encoding

Zhao, Liang, Feng, Xiaocheng, Feng, Xiachong, Qin, Bing, Liu, Ting

arXiv.org Artificial IntelligenceDec-28-2023

Transformer has taken the natural language processing (NLP) field by storm since birth, owing to its superior ability to model complex dependencies in sequences. Despite the great success of pretrained language models (PLMs) based on Transformer across almost all NLP tasks, they all suffer from a preset length limit and thus can hardly extend this success to longer sequences beyond seen data, namely the length extrapolation problem. Length extrapolation has aroused great interest among researchers, as it is the core feature of human language capacity. To enhance length extrapolation of Transformers, a plethora of methods have been proposed, mostly focusing on extrapolatable position encodings. In this article, we provide an organized and systematical review of these research efforts in a unified notation from a position encoding perspective, aiming to enable the reader to gain a deep understanding of existing methods and provide stimuli for future research.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2312.17044

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications

Feng, Zhangyin, Ma, Weitao, Yu, Weijiang, Huang, Lei, Wang, Haotian, Chen, Qianglong, Peng, Weihua, Feng, Xiaocheng, Qin, Bing, liu, Ting

arXiv.org Artificial IntelligenceDec-7-2023

Large language models (LLMs) exhibit superior performance on various natural language tasks, but they are susceptible to issues stemming from outdated data and domain-specific limitations. In order to address these challenges, researchers have pursued two primary strategies, knowledge editing and retrieval augmentation, to enhance LLMs by incorporating external information from different aspects. Nevertheless, there is still a notable absence of a comprehensive survey. In this paper, we propose a review to discuss the trends in integration of knowledge and large language models, including taxonomy of methods, benchmarks, and applications. In addition, we conduct an in-depth analysis of different methods and point out potential research directions in the future. We hope this survey offers the community quick access and a comprehensive overview of this research area, with the intention of inspiring future research endeavors.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2311.05876

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Overview (1.00)

Industry:

Leisure & Entertainment (0.68)
Government (0.67)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback