AITopics | Li, Xiaoguang

Collaborating Authors

Li, Xiaoguang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities

Zhuang, Tianyi, Kuang, Chuqiao, Li, Xiaoguang, Teng, Yihua, Wu, Jihao, Wang, Yasheng, Shang, Lifeng

arXiv.org Artificial IntelligenceFeb-24-2025

We present DocPuzzle, a rigorously constructed benchmark for evaluating long-context reasoning capabilities in large language models (LLMs). This benchmark comprises 100 expert-level QA problems requiring multi-step reasoning over long real-world documents. To ensure the task quality and complexity, we implement a human-AI collaborative annotation-validation pipeline. DocPuzzle introduces an innovative evaluation framework that mitigates guessing bias through checklist-guided process analysis, establishing new standards for assessing reasoning capacities in LLMs. Our evaluation results show that: 1)Advanced slow-thinking reasoning models like o1-preview(69.7%) and DeepSeek-R1(66.3%) significantly outperform best general instruct models like Claude 3.5 Sonnet(57.7%); 2)Distilled reasoning models like DeepSeek-R1-Distill-Qwen-32B(41.3%) falls far behind the teacher model, suggesting challenges to maintain the generalization of reasoning capabilities relying solely on distillation.

benchmark, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.17807

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

CiteCheck: Towards Accurate Citation Faithfulness Detection

Xu, Ziyao, Wei, Shaohang, Han, Zhuoheng, Jin, Jing, Yang, Zhe, Li, Xiaoguang, Tan, Haochen, Guo, Zhijiang, Wang, Houfeng

arXiv.org Artificial IntelligenceFeb-15-2025

Citation faithfulness detection is critical for enhancing retrieval-augmented generation (RAG) systems, yet large-scale Chinese datasets for this task are scarce. Existing methods face prohibitive costs due to the need for manually annotated negative samples. To address this, we introduce the first large-scale Chinese dataset CiteCheck for citation faithfulness detection, constructed via a cost-effective approach using two-stage manual annotation. This method balances positive and negative samples while significantly reducing annotation expenses. CiteCheck comprises training and test splits. Experiments demonstrate that: (1) the test samples are highly challenging, with even state-of-the-art LLMs failing to achieve high accuracy; and (2) training data augmented with LLM-generated negative samples enables smaller models to attain strong performance using parameter-efficient fine-tuning. CiteCheck provides a robust foundation for advancing citation faithfulness detection in Chinese RAG systems. The dataset is publicly available to facilitate research.

information, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.10881

Country:

Asia (0.94)
North America > United States (0.28)
Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

More Tokens, Lower Precision: Towards the Optimal Token-Precision Trade-off in KV Cache Compression

Zhang, Jiebin, Zhu, Dawei, Song, Yifan, Wu, Wenhao, Kuang, Chuqiao, Li, Xiaoguang, Shang, Lifeng, Liu, Qun, Li, Sujian

arXiv.org Artificial IntelligenceDec-17-2024

As large language models (LLMs) process increasing context windows, the memory usage of KV cache has become a critical bottleneck during inference. The mainstream KV compression methods, including KV pruning and KV quantization, primarily focus on either token or precision dimension and seldom explore the efficiency of their combination. In this paper, we comprehensively investigate the token-precision trade-off in KV cache compression. Experiments demonstrate that storing more tokens in the KV cache with lower precision, i.e., quantized pruning, can significantly enhance the long-context performance of LLMs. Furthermore, in-depth analysis regarding token-precision trade-off from a series of key aspects exhibit that, quantized pruning achieves substantial improvements in retrieval-related tasks and consistently performs well across varying input lengths. Moreover, quantized pruning demonstrates notable stability across different KV pruning methods, quantization strategies, and model scales. These findings provide valuable insights into the token-precision trade-off in KV cache compression. We plan to release our code in the near future.

artificial intelligence, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

2412.12706

Country:

North America > United States (0.14)
Asia > Thailand (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Evaluating the External and Parametric Knowledge Fusion of Large Language Models

Zhang, Hao, Zhang, Yuyang, Li, Xiaoguang, Shi, Wenxuan, Xu, Haonan, Liu, Huanshuo, Wang, Yasheng, Shang, Lifeng, Liu, Qun, Liu, Yong, Tang, Ruiming

arXiv.org Artificial IntelligenceMay-29-2024

Integrating external knowledge into large language models (LLMs) presents a promising solution to overcome the limitations imposed by their antiquated and static parametric memory. Prior studies, however, have tended to over-reliance on external knowledge, underestimating the valuable contributions of an LLMs' intrinsic parametric knowledge. The efficacy of LLMs in blending external and parametric knowledge remains largely unexplored, especially in cases where external knowledge is incomplete and necessitates supplementation by their parametric knowledge. We propose to deconstruct knowledge fusion into four distinct scenarios, offering the first thorough investigation of LLM behavior across each. We develop a systematic pipeline for data construction and knowledge infusion to simulate these fusion scenarios, facilitating a series of controlled experiments. Our investigation reveals that enhancing parametric knowledge within LLMs can significantly bolster their capability for knowledge integration. Nonetheless, we identify persistent challenges in memorizing and eliciting parametric knowledge, and determining parametric knowledge boundaries. Our findings aim to steer future explorations on harmonizing external and parametric knowledge within LLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.1901

Country: North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Retrieval-based Full-length Wikipedia Generation for Emergent Events

Zhang, Jiebin, Yu, Eugene J., Chen, Qinyu, Xiong, Chenhao, Zhu, Dawei, Qian, Han, Song, Mingbo, Li, Xiaoguang, Liu, Qun, Li, Sujian

arXiv.org Artificial IntelligenceFeb-28-2024

In today's fast-paced world, the growing demand to quickly generate comprehensive and accurate Wikipedia documents for emerging events is both crucial and challenging. However, previous efforts in Wikipedia generation have often fallen short of meeting real-world requirements. Some approaches focus solely on generating segments of a complete Wikipedia document, while others overlook the importance of faithfulness in generation or fail to consider the influence of the pre-training corpus. In this paper, we simulate a real-world scenario where structured full-length Wikipedia documents are generated for emergent events using input retrieved from web sources. To ensure that Large Language Models (LLMs) are not trained on corpora related to recently occurred events, we select events that have taken place recently and introduce a new benchmark Wiki-GenBen, which consists of 309 events paired with their corresponding retrieved web pages for generating evidence. Additionally, we design a comprehensive set of systematic evaluation metrics and baseline methods, to evaluate the capability of LLMs in generating factual full-length Wikipedia documents. The data and code are open-sourced at WikiGenBench.

information, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2402.18264

Country:

North America > United States > Michigan (0.14)
North America > United States > Tennessee (0.14)
Europe > United Kingdom > England (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Evaluating Robustness of Generative Search Engine on Adversarial Factual Questions

Hu, Xuming, Li, Xiaochuan, Chen, Junzhe, Li, Yinghui, Li, Yangning, Li, Xiaoguang, Wang, Yasheng, Liu, Qun, Wen, Lijie, Yu, Philip S., Guo, Zhijiang

arXiv.org Artificial IntelligenceFeb-25-2024

Generative search engines have the potential to transform how people seek information online, but generated responses from existing large language models (LLMs)-backed generative search engines may not always be accurate. Nonetheless, retrieval-augmented generation exacerbates safety concerns, since adversaries may successfully evade the entire system by subtly manipulating the most vulnerable part of a claim. To this end, we propose evaluating the robustness of generative search engines in the realistic and high-risk setting, where adversaries have only black-box system access and seek to deceive the model into returning incorrect responses. Through a comprehensive human evaluation of various generative search engines, such as Bing Chat, PerplexityAI, and YouChat across diverse queries, we demonstrate the effectiveness of adversarial factual questions in inducing incorrect responses. Moreover, retrieval-augmented generation exhibits a higher susceptibility to factual errors compared to LLMs without retrieval. These findings highlight the potential security risks of these systems and emphasize the need for rigorous evaluation before deployment.

information retrieval, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2403.12077

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)
Europe > Austria > Vienna (0.14)
(5 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Media > Music (1.00)
Media > Film (1.00)
Leisure & Entertainment (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer

Hu, Xinshuo, Hu, Baotian, Li, Dongfang, Li, Xiaoguang, Shang, Lifeng

arXiv.org Artificial IntelligenceFeb-22-2024

The present study introduces the knowledge-augmented generator, which is specifically designed to produce information that remains grounded in contextual knowledge, regardless of alterations in the context. Previous research has predominantly focused on examining hallucinations stemming from static input, such as in the domains of summarization or machine translation. However, our investigation delves into the faithfulness of generative question answering in the presence of dynamic knowledge. Our objective is to explore the existence of hallucinations arising from parametric memory when contextual knowledge undergoes changes, while also analyzing the underlying causes for their occurrence. In order to efficiently address this issue, we propose a straightforward yet effective measure for detecting such hallucinations. Intriguingly, our investigation uncovers that all models exhibit a tendency to generate previous answers as hallucinations. To gain deeper insights into the underlying causes of this phenomenon, we conduct a series of experiments that verify the critical role played by context in hallucination, both during training and testing, from various perspectives.

computational linguistic, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2402.14488

Country:

Europe (1.00)
North America > United States > Louisiana (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(2 more...)

Genre: Research Report > New Finding (0.47)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.66)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models

Tan, Haochen, Guo, Zhijiang, Shi, Zhan, Xu, Lu, Liu, Zhili, Feng, Yunlong, Li, Xiaoguang, Wang, Yasheng, Shang, Lifeng, Liu, Qun, Song, Linqi

arXiv.org Artificial IntelligenceFeb-11-2024

Large Language Models (LLMs) have exhibited remarkable success in long-form context comprehension tasks. However, their capacity to generate long contents, such as reports and articles, remains insufficiently explored. Current benchmarks do not adequately assess LLMs' ability to produce informative and comprehensive content, necessitating a more rigorous evaluation approach. In this study, we introduce \textsc{ProxyQA}, a framework for evaluating long-form text generation, comprising in-depth human-curated \textit{meta-questions} spanning various domains. Each meta-question contains corresponding \textit{proxy-questions} with annotated answers. LLMs are prompted to generate extensive content in response to these meta-questions. Utilizing an evaluator and incorporating generated content as background context, \textsc{ProxyQA} evaluates the quality of generated content based on the evaluator's performance in answering the \textit{proxy-questions}. We examine multiple LLMs, emphasizing \textsc{ProxyQA}'s demanding nature as a high-quality assessment tool. Human evaluation demonstrates that evaluating through \textit{proxy-questions} is a highly self-consistent and human-criteria-correlated validation method. The dataset and leaderboard will be available at \url{https://github.com/Namco0816/ProxyQA}.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.15042

Country:

Europe (0.93)
Asia > Middle East > UAE (0.14)
North America > United States > Michigan (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation

Thakur, Nandan, Bonifacio, Luiz, Zhang, Xinyu, Ogundepo, Odunayo, Kamalloo, Ehsan, Alfonso-Hermelo, David, Li, Xiaoguang, Liu, Qun, Chen, Boxing, Rezagholizadeh, Mehdi, Lin, Jimmy

arXiv.org Artificial IntelligenceDec-18-2023

Retrieval-augmented generation (RAG) grounds large language model (LLM) output by leveraging external knowledge sources to reduce factual hallucinations. However, prior works lack a comprehensive evaluation of different language families, making it challenging to evaluate LLM robustness against errors in external retrieved knowledge. To overcome this, we establish NoMIRACL, a human-annotated dataset for evaluating LLM robustness in RAG across 18 typologically diverse languages. NoMIRACL includes both a non-relevant and a relevant subset. Queries in the non-relevant subset contain passages manually judged as non-relevant or noisy, whereas queries in the relevant subset include at least a single judged relevant passage. We measure LLM robustness using two metrics: (i) hallucination rate, measuring model tendency to hallucinate an answer, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset. We build a GPT-4 baseline which achieves a 33.2% hallucination rate on the non-relevant and a 14.9% error rate on the relevant subset on average. Our evaluation reveals that GPT-4 hallucinates frequently in high-resource languages, such as French or English. This work highlights an important avenue for future research to improve LLM robustness to learn how to better reject non-relevant information in RAG.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2312.11361

Country:

Europe (1.00)
North America > Canada > Ontario (0.14)
North America > United States > Hawaii (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Retrieval-based Disentanglement with Distant Supervision

Zhou, Jiawei, Li, Xiaoguang, Shang, Lifeng, Jiang, Xin, Liu, Qun, Chen, Lei

arXiv.org Artificial IntelligenceDec-15-2022

Disentangled representation learning remains challenging as ground truth factors of variation do not naturally exist. To address this, we present Vocabulary Disentanglement Retrieval~(VDR), a simple yet effective retrieval-based disentanglement framework that leverages nature language as distant supervision. Our approach is built upon the widely-used bi-encoder architecture with disentanglement heads and is trained on data-text pairs that are readily available on the web or in existing datasets. This makes our approach task- and modality-agnostic with potential for a wide range of downstream applications. We conduct experiments on 16 datasets in both text-to-text and cross-modal scenarios and evaluate VDR in a zero-shot setting. With the incorporation of disentanglement heads and a minor increase in parameters, VDR achieves significant improvements over the base retriever it is built upon, with a 9% higher on NDCG@10 scores in zero-shot text-to-text retrieval and an average of 13% higher recall in cross-modal retrieval. In comparison to other baselines, VDR outperforms them in most tasks, while also improving explainability and efficiency.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2212.07699

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)

Add feedback