AITopics | quantifying

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence

Neural Information Processing SystemsDec-25-2025, 02:11:24 GMT

Retrieval augmented generation (RAG) is frequently used to mitigate hallucinations and provide up-to-date knowledge for large language models (LLMs). However, given that document retrieval is an imprecise task and sometimes results in erroneous or even harmful content being presented in context, this raises the question of how LLMs handle retrieved information: If the provided content is incorrect, does the model know to ignore it, or does it recapitulate the error? Conversely, when the model's initial response is incorrect, does it always know to use the retrieved information to correct itself, or does it insist on its wrong prior response? To answer this, we curate a dataset of over 1200 questions across six domains (e.g., drug dosages, Olympic records, locations) along with content relevant to answering each question. We further apply precise perturbations to the answers in the content that range from subtle to blatant errors.We benchmark six top-performing LLMs, including GPT-4o, on this dataset and find that LLMs are susceptible to adopting incorrect retrieved content, overriding their own correct prior knowledge over 60\% of the time. However, the more unrealistic the retrieved content is (i.e. more deviated from truth), the less likely the model is to adopt it. Also, the less confident a model is in its initial response (via measuring token probabilities), the more likely it is to adopt the information in the retrieved content. We exploit this finding and demonstrate simple methods for improving model accuracy where there is conflicting retrieved content. Our results highlight a difficult task and benchmark for LLMs -- namely, their ability to correctly discern when it is wrong in light of correct retrieved content and to reject cases when the provided content is incorrect.

large language model, machine learning, natural language, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Add feedback

Shapley Residuals: Quantifying the limits of the Shapley value for explanations

Neural Information Processing SystemsDec-25-2025, 01:41:31 GMT

Popular feature importance techniques compute additive approximations to nonlinear models by first defining a cooperative game describing the value of different subsets of the model's features, then calculating the resulting game's Shapley values to attribute credit additively between the features. However, the specific modeling settings in which the Shapley values are a poor approximation for the true game have not been well-described. In this paper we utilize an interpretation of Shapley values as the result of an orthogonal projection between vector spaces to calculate a residual representing the kernel component of that projection. We provide an algorithm for computing these residuals, characterize different modeling settings based on the value of the residuals, and demonstrate that they capture information about model predictions that Shapley values cannot. Shapley residuals can thus act as a warning to practitioners against overestimating the degree to which Shapley-value-based explanations give them insight into a model.

quantifying, shapley residual, shapley value, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality

Neural Information Processing SystemsDec-24-2025, 21:27:03 GMT

We consider the problem of estimating the Wasserstein distance between the empirical measure and a set of probability measures whose expectations over a class of functions (hypothesis class) are constrained. If this class is sufficiently rich to characterize a particular distribution (e.g., all Lipschitz functions), then our formulation recovers the Wasserstein distance to such a distribution. We establish a strong duality result that generalizes the celebrated Kantorovich-Rubinstein duality. We also show that our formulation can be used to beat the curse of dimensionality, which is well known to affect the rates of statistical convergence of the empirical Wasserstein distance. In particular, examples of infinite-dimensional hypothesis classes are presented, informed by a complex correlation structure, for which it is shown that the empirical Wasserstein distance to such classes converges to zero at the standard parametric rate. Our formulation provides insights that help clarify why, despite the curse of dimensionality, the Wasserstein distance enjoys favorable empirical performance across a wide range of statistical applications.

empirical wasserstein distance, name change, quantifying, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

Quantifying the Gain in Weak-to-Strong Generalization

Neural Information Processing SystemsMay-27-2025, 19:56:31 GMT

Recent advances in large language models have shown capabilities that are extraordinary and near-superhuman. These models operate with such complexity that reliably evaluating and aligning them proves challenging for humans. This leads to the natural question: can guidance from weak models (like humans) adequately direct the capabilities of strong models? In a recent and somewhat surprising work, Burns et al. (2023) empirically demonstrated that when strong models (like GPT-4) are finetuned using labels generated by weak supervisors (like GPT-2), the strong models outperform their weaker counterparts---a phenomenon they term weak-to-strong generalization.In this work, we present a theoretical framework for understanding weak-to-strong generalization. Specifically, we show that the improvement in performance achieved by strong models over their weaker counterparts is quantified by the misfit error incurred by the strong model on labels generated by the weaker model.

quantifying, strong model, weak-to-strong generalization, (4 more...)

Neural Information Processing Systems

Country: Asia > Afghanistan > Parwan Province > Charikar (0.09)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence

Neural Information Processing SystemsMay-26-2025, 21:46:58 GMT

Retrieval augmented generation (RAG) is frequently used to mitigate hallucinations and provide up-to-date knowledge for large language models (LLMs). However, given that document retrieval is an imprecise task and sometimes results in erroneous or even harmful content being presented in context, this raises the question of how LLMs handle retrieved information: If the provided content is incorrect, does the model know to ignore it, or does it recapitulate the error? Conversely, when the model's initial response is incorrect, does it always know to use the retrieved information to correct itself, or does it insist on its wrong prior response? To answer this, we curate a dataset of over 1200 questions across six domains (e.g., drug dosages, Olympic records, locations) along with content relevant to answering each question. We further apply precise perturbations to the answers in the content that range from subtle to blatant errors.We benchmark six top-performing LLMs, including GPT-4o, on this dataset and find that LLMs are susceptible to adopting incorrect retrieved content, overriding their own correct prior knowledge over 60\% of the time.

large language model, machine learning, natural language, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

Quantifying the Ease of Reproducing Training Data in Unconditional Diffusion Models

Hasegawa, Masaya, Yasuda, Koji

arXiv.org Artificial IntelligenceMar-25-2025

Diffusion models, which have been advancing rapidly in recent years, may generate samples that closely resemble the training data. This phenomenon, known as memorization, may lead to copyright issues. In this study, we propose a method to quantify the ease of reproducing training data in unconditional diffusion models. The average of a sample population following the Langevin equation in the reverse diffusion process moves according to a first-order ordinary differential equation (ODE). This ODE establishes a 1-to-1 correspondence between images and their noisy counterparts in the latent space. Since the ODE is reversible and the initial noisy images are sampled randomly, the volume of an image's projected area represents the probability of generating those images. We examined the ODE, which projects images to latent space, and succeeded in quantifying the ease of reproducing training data by measuring the volume growth rate in this process. Given the relatively low computational complexity of this method, it allows us to enhance the quality of training data by detecting and modifying the easily memorized training samples.

artificial intelligence, growth rate, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.19429

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.05)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > Massachusetts > Norfolk County > Brookline (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

Quantifying the Robustness of Retrieval-Augmented Language Models Against Spurious Features in Grounding Data

Yang, Shiping, Wu, Jie, Ding, Wenbiao, Wu, Ning, Liang, Shining, Gong, Ming, Zhang, Hengyuan, Zhang, Dongmei

arXiv.org Artificial IntelligenceMar-7-2025

Robustness has become a critical attribute for the deployment of RAG systems in real-world applications. Existing research focuses on robustness to explicit noise (e.g., document semantics) but overlooks spurious features (a.k.a. implicit noise). While previous works have explored spurious features in LLMs, they are limited to specific features (e.g., formats) and narrow scenarios (e.g., ICL). In this work, we statistically confirm the presence of spurious features in the RAG paradigm, a robustness problem caused by the sensitivity of LLMs to semantic-agnostic features. Moreover, we provide a comprehensive taxonomy of spurious features and empirically quantify their impact through controlled experiments. Further analysis reveals that not all spurious features are harmful and they can even be beneficial sometimes. Extensive evaluation results across multiple LLMs suggest that spurious features are a widespread and challenging problem in the field of RAG. The code and dataset will be released to facilitate future research. We release all codes and data at: $\\\href{https://github.com/maybenotime/RAG-SpuriousFeatures}{https://github.com/maybenotime/RAG-SpuriousFeatures}$.

perturbation, robustness, spurious feature, (11 more...)

arXiv.org Artificial Intelligence

2503.05587

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Review for NeurIPS paper: Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality

Neural Information Processing SystemsFeb-8-2025, 04:05:34 GMT

Summary and Contributions: ***** UPDATE ***** I realize I might have been harsh in my evaluation. I believe the paper would have been more suited for a more theory oriented statistics conference / journal, but this is a recurrent problem in NeurIPS and I shouldn't have taken it out on the authors. While their theoretical result is really interesting, I also didn't appreciate that the authors barely mentioned previous work on statistical learning bounds with optimal transport. There have been recent efforts on the topic by several teams, and they should at least acknowledge them. However, if other reviewers took the time to thoroughly review the proof of the main result, I'm willing to increase my score.

dimensionality, empirical wasserstein distance, neurips paper, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.40)

Add feedback

Review for NeurIPS paper: Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality

Neural Information Processing SystemsFeb-8-2025, 04:05:27 GMT

Most of the reviewers were excited about this work, and I'm pleased to recommend it for publication. In the revision, please address all promised changes in the rebuttals and/or requested in the reviews. The outlier R1 has some valid points about the exposition as well as discomfort with the length of the appendix (it's true this is difficult to review in the NeurIPS environment), but these are not reasons to reject the work. That said, the authors of this paper are encouraged to take R1's expository suggestions seriously in their revision to make the work as approachable as possible.

dimensionality, empirical wasserstein distance, neurips paper, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.40)

Add feedback

Shapley Residuals: Quantifying the limits of the Shapley value for explanations

Neural Information Processing SystemsJan-19-2025, 10:25:33 GMT

Popular feature importance techniques compute additive approximations to nonlinear models by first defining a cooperative game describing the value of different subsets of the model's features, then calculating the resulting game's Shapley values to attribute credit additively between the features. However, the specific modeling settings in which the Shapley values are a poor approximation for the true game have not been well-described. In this paper we utilize an interpretation of Shapley values as the result of an orthogonal projection between vector spaces to calculate a residual representing the kernel component of that projection. We provide an algorithm for computing these residuals, characterize different modeling settings based on the value of the residuals, and demonstrate that they capture information about model predictions that Shapley values cannot. Shapley residuals can thus act as a warning to practitioners against overestimating the degree to which Shapley-value-based explanations give them insight into a model.

explanation, shapley residual, shapley value, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.45)

Add feedback

Filters

Collaborating Authors

quantifying

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence

Shapley Residuals: Quantifying the limits of the Shapley value for explanations

Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality

Quantifying the Gain in Weak-to-Strong Generalization

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence

Quantifying the Ease of Reproducing Training Data in Unconditional Diffusion Models

Quantifying the Robustness of Retrieval-Augmented Language Models Against Spurious Features in Grounding Data

Review for NeurIPS paper: Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality

Review for NeurIPS paper: Quantifying the Empirical Wasserstein Distance to a Set of Measures: Beating the Curse of Dimensionality

Shapley Residuals: Quantifying the limits of the Shapley value for explanations