AITopics

2605.20468

Country: Asia (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Musculoskeletal (0.89)
Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Jia, Mumin, Chen, Yilin, Sharma, Divya, Diaz-Rodriguez, Jairo

When Can Digital Personas Reliably Approximate Human Survey Findings?

arXiv.org Machine LearningMay-12-2026

Digital personas powered by Large Language Models (LLMs) are increasingly proposed as substitutes for human survey respondents, yet it remains unclear when they can reliably approximate human survey findings. We answer this question using the LISS panel, constructing personas from respondents' background variables and pre-2023 survey histories, then testing them against the same respondents' held-out post-cutoff answers. Across four persona architectures, three LLMs, and two prediction tasks, we assess performance at the question, respondent, distributional, equity, and clustering levels. Digital personas improve alignment with human response distributions, especially in domains tied to stable attributes and values, but remain limited for individual prediction and fail to recover multivariate respondent structure. Retrieval-augmented architectures provide the clearest gains, but performance depends more on human response structure than on model choice: personas perform best for low-variability questions and common respondent patterns, and worst for subjective, heterogeneous, or rare responses. Our results provide practical guidance on when digital personas could be appropriate for survey research and when human validation remains necessary.

artificial intelligence, large language model, natural language, (17 more...)

2605.10659

Country:

North America > United States (0.67)
North America > Canada > Ontario (0.14)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Neural Information Processing SystemsMay-1-2026, 06:36:25 GMT

Hardware Resilience Properties of Text-Guided Image Classifiers

This paper presents a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors. By utilizing enriched text embeddings derived from GPT-3 with question prompts per class and CLIP pretrained text encoder, we investigate their impact as an initialization for the classification layer. Our approach achieves a remarkable 5.5 average increase in hardware reliability (and up to 14) across various architectures in the most critical layer, with minimal accuracy drop (0.3% on average) compared to baseline PyTorch models. Furthermore, our method seamlessly integrates with any image classification backbone, showcases results across various network architectures, decreases parameter and FLOPs overhead, and follows a consistent training recipe. This research offers a practical and efficient solution to bolster the robustness of image classification models against hardware failures, with potential implications for future studies in this domain.

artificial intelligence, machine learning, natural language, (19 more...)

Country: North America > United States (0.28)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry: Automobiles & Trucks (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsApr-24-2026, 13:30:45 GMT

On the Epistemic Limits of Personalized Prediction

Machine learning models are often personalized by using group attributes that encode personal characteristics (e.g., sex, age group, HIV status). In such settings, individuals expect to receive more accurate predictions in return for disclosing group attributes to the personalized model. We study when we can tell that a personalized model upholds this principle for every group who provides personal data. We introduce a metric called the benefit of personalization (BoP) to measure the smallest gain in accuracy that any group expects to receive from a personalized model. We describe how the BoP can be used to carry out basic routines to audit a personalized model, including: (i) hypothesis tests to check that a personalized model improves performance for every group; (ii) estimation procedures to bound the minimum gain in personalization. We characterize the reliability of these routines in a finite-sample regime and present minimax bounds on both the probability of error for BoP hypothesis tests and the mean-squared error of BoP estimates. Our results show that we can only claim that personalization improves performance for each group who provides data when we explicitly limit the number of group attributes used by a personalized model. In particular, we show that it is impossible to reliably verify that a personalized classifier with k 19 binary group attributes will benefit every group who provides personal data using a dataset of n = 8 109 samples - one for each person in the world.

artificial intelligence, machine learning, personalization, (19 more...)

Country: North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Sara Magliacane, Tom Claassen, Joris M. Mooij

Ancestral Causal Inference

Neural Information Processing SystemsApr-22-2026, 13:44:34 GMT

Constraint-based causal discovery from limited data is a notoriously difficult challenge due to the many borderline independence test decisions. Several approaches to improve the reliability of the predictions by exploiting redundancy in the independence information have been proposed recently. Though promising, existing approaches can still be greatly improved in terms of accuracy and scalability. We present a novel method that reduces the combinatorial explosion of the search space by using a more coarse-grained representation of causal information, drastically reducing computation time. Additionally, we propose a method to score causal predictions based on their confidence. Crucially, our implementation also allows one to easily combine observational and interventional data and to incorporate various types of available background knowledge. We prove soundness and asymptotic consistency of our method and demonstrate that it can outperform the state-ofthe-art on synthetic data, achieving a speedup of several orders of magnitude. We illustrate its practical feasibility by applying it to a challenging protein data set.

artificial intelligence, machine learning, relation, (18 more...)

Country: Europe (0.93)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Ao, Ruicheng, Gao, Siyang, Simchi-Levi, David

On the Reliability Limits of LLM-Based Multi-Agent Planning

arXiv.org Machine LearningMar-31-2026

This technical note studies the reliability limits of LLM-based multi-agent planning as a delegated decision problem. We model the LLM-based multi-agent architecture as a finite acyclic decision network in which multiple stages process shared model-context information, communicate through language interfaces with limited capacity, and may invoke human review. We show that, without new exogenous signals, any delegated network is decision-theoretically dominated by a centralized Bayes decision maker with access to the same information. In the common-evidence regime, this implies that optimizing over multi-agent directed acyclic graphs under a finite communication budget can be recast as choosing a budget-constrained stochastic experiment on the shared signal. We also characterize the loss induced by communication and information compression. Under proper scoring rules, the gap between the centralized Bayes value and the value after communication admits an expected posterior divergence representation, which reduces to conditional mutual information under logarithmic loss and to expected squared posterior error under the Brier score. These results characterize the fundamental reliability limits of delegated LLM planning. Experiments with LLMs on a controlled problem set further demonstrate these characterizations.

artificial intelligence, communication, information, (16 more...)

2603.26993

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Kosovo > District of Gjilan > Kamenica (0.04)
Asia > China > Hong Kong > Kowloon (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Lassance, Rodrigo F. L., De Bock, Jasper

Robustness Quantification for Discriminative Models: a New Robustness Metric and its Application to Dynamic Classifier Selection

arXiv.org Machine LearningMar-25-2026

Among the different possible strategies for evaluating the reliability of individual predictions of classifiers, robustness quantification stands out as a method that evaluates how much uncertainty a classifier could cope with before changing its prediction. However, its applicability is more limited than some of its alternatives, since it requires the use of generative models and restricts the analyses either to specific model architectures or discrete features. In this work, we propose a new robustness metric applicable to any probabilistic discriminative classifier and any type of features. We demonstrate that this new metric is capable of distinguishing between reliable and unreliable predictions, and use this observation to develop new strategies for dynamic classifier selection.

artificial intelligence, detavernier, machine learning, (19 more...)

2603.23318

Country:

South America > Brazil (0.05)
Europe > Belgium > Flanders > East Flanders > Ghent (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Charpentier, Arthur, Machado, Agathe Fernandes

Decomposing Probabilistic Scores: Reliability, Information Loss and Uncertainty

arXiv.org Machine LearningMar-24-2026

Calibration is a conditional property that depends on the information retained by a predictor. We develop decomposition identities for arbitrary proper losses that make this dependence explicit. At any information level $\mathcal A$, the expected loss of an $\mathcal A$-measurable predictor splits into a proper-regret (reliability) term and a conditional entropy (residual uncertainty) term. For nested levels $\mathcal A\subseteq\mathcal B$, a chain decomposition quantifies the information gain from $\mathcal A$ to $\mathcal B$. Applied to classification with features $\boldsymbol{X}$ and score $S=s(\boldsymbol{X})$, this yields a three-term identity: miscalibration, a {\em grouping} term measuring information loss from $\boldsymbol{X}$ to $S$, and irreducible uncertainty at the feature level. We leverage the framework to analyze post-hoc recalibration, aggregation of calibrated models, and stagewise/boosting constructions, with explicit forms for Brier and log-loss.

artificial intelligence, calibration, machine learning, (17 more...)

2603.15232

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > Canada (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsMar-22-2026, 21:17:45 GMT

Towards Reliable Model Selection for Unsupervised Domain Adaptation: An Empirical Study and A Certified Baseline

Selecting appropriate hyperparameters is crucial for unlocking the full potential of advanced unsupervised domain adaptation (UDA) methods in unlabeled target domains. Although this challenge remains under-explored, it has recently garnered increasing attention with the proposals of various model selection methods. Reliable model selection should maintain performance across diverse UDA methods and scenarios, especially avoiding highly risky worst-case selections--selecting the model or hyperparameter with the worst performance in the pool.\textit{Are

artificial intelligence, machine learning, proceedings, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.75)