AITopics

Country: North America (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsFeb-17-2026, 11:35:32 GMT

dbd6b295535e44f2b8ec0c3f1da7c509-Supplemental-Conference.pdf

artificial intelligence, assumption, machine learning, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

arXiv.org Machine LearningJan-27-2026

Falsifying Predictive Algorithm

Coston, Amanda

Empirical investigations into unintended model behavior often show that the algorithm is predicting another outcome than what was intended. These exposes highlight the need to identify when algorithms predict unintended quantities - ideally before deploying them into consequential settings. We propose a falsification framework that provides a principled statistical test for discriminant validity: the requirement that an algorithm predict intended outcomes better than impermissible ones. Drawing on falsification practices from causal inference, econometrics, and psychometrics, our framework compares calibrated prediction losses across outcomes to assess whether the algorithm exhibits discriminant validity with respect to a specified impermissible proxy. In settings where the target outcome is difficult to observe, multiple permissible proxy outcomes may be available; our framework accommodates both this setting and the case with a single permissible proxy. Throughout we use nonparametric hypothesis testing methods that make minimal assumptions on the data-generating process. We illustrate the method in an admissions setting, where the framework establishes discriminant validity with respect to gender but fails to establish discriminant validity with respect to race. This demonstrates how falsification can serve as an early validity check, prior to fairness or robustness analyses. We also provide analysis in a criminal justice setting, where we highlight the limitations of our framework and emphasize the need for complementary approaches to assess other aspects of construct validity and external validity.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2601.17146

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Europe > United Kingdom (0.14)
North America > United States > Pennsylvania (0.04)
(7 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine > Consumer Health (0.67)
Law > Criminal Law (0.66)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Dimitrios Milios, Raffaello Camoriano, Pietro Michiardi, Lorenzo Rosasco, Maurizio Filippone

Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification

Neural Information Processing SystemsNov-20-2025, 19:32:37 GMT

Neural Information Processing Systems http://nips.cc/

classification, classifier, likelihood, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Italy (0.04)
Europe > France (0.04)
(11 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

arXiv.org Artificial IntelligenceNov-12-2025

AIA Forecaster: Technical Report

Alur, Rohan, Stadie, Bradly C., Kang, Daniel, Chen, Ryan, McManus, Matt, Rickert, Michael, Lee, Tyler, Federici, Michael, Zhu, Richard, Fogerty, Dennis, Williamson, Hayley, Lozinski, Nina, Linsky, Aaron, Sekhon, Jasjeet S.

This technical report describes the AIA Forecaster, a Large Language Model (LLM)-based system for judgmental forecasting using unstructured data. The AIA Forecaster approach combines three core elements: agentic search over high-quality news sources, a supervisor agent that reconciles disparate forecasts for the same event, and a set of statistical calibration techniques to counter behavioral biases in large language models. On the ForecastBench benchmark (Karger et al., 2024), the AIA Forecaster achieves performance equal to human superforecasters, surpassing prior LLM baselines. In addition to reporting on ForecastBench, we also introduce a more challenging forecasting benchmark sourced from liquid prediction markets. While the AIA Forecaster underperforms market consensus on this benchmark, an ensemble combining AIA Forecaster with market consensus outperforms consensus alone, demonstrating that our forecaster provides additive information. Our work establishes a new state of the art in AI forecasting and provides practical, transferable recommendations for future research. To the best of our knowledge, this is the first work that verifiably achieves expert-level forecasting at scale.

large language model, machine learning, natural language, (19 more...)

2511.07678

Country:

North America > United States (0.93)
Europe > France (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Government (1.00)
Banking & Finance > Trading (1.00)
Leisure & Entertainment > Games > Chess (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

arXiv.org Artificial IntelligenceNov-12-2025

Residual Rotation Correction using Tactile Equivariance

Zhu, Yizhe, Ye, Zhang, Hu, Boce, Zhao, Haibo, Qi, Yu, Wang, Dian, Platt, Robert

However, the high cost of tactile data collection makes sample efficiency the key requirement for developing visuotactile policies. We present EquiT ac, a framework that exploits the inherent SO(2) symmetry of in-hand object rotation to improve sample efficiency and generalization for visuotactile policy learning. EquiT ac first reconstructs surface normals from raw RGB inputs of vision-based tactile sensors, so rotations of the normal vector field correspond to in-hand object rotations. An SO(2)- equivariant network then predicts a residual rotation action that augments a base visuomotor policy at test time, enabling real-time rotation correction without additional reorientation demonstrations. On a real robot, EquiT ac accurately achieves robust zero-shot generalization to unseen in-hand orientations with very few training samples, where baselines fail even with more training data. T o our knowledge, this is the first tactile learning method to explicitly encode tactile equivari-ance for policy learning, yielding a lightweight, symmetry-aware module that improves reliability in contact-rich tasks.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

2511.07381

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.46)

arXiv.org Artificial IntelligenceOct-27-2025

Generalizable Hierarchical Skill Learning via Object-Centric Representation

Zhao, Haibo, Qi, Yu, Hu, Boce, Zhu, Yizhe, Chen, Ziyan, Tian, Heng, Zhu, Xupeng, Howell, Owen, Huang, Haojie, Walters, Robin, Wang, Dian, Platt, Robert

We present Generalizable Hierarchical Skill Learning (GSL), a novel framework for hierarchical policy learning that significantly improves policy generalization and sample efficiency in robot manipulation. One core idea of GSL is to use object-centric skills as an interface that bridges the high-level vision-language model and the low-level visual-motor policy. Specifically, GSL decomposes demonstrations into transferable and object-canonicalized skill primitives using foundation models, ensuring efficient low-level skill learning in the object frame. At test time, the skill-object pairs predicted by the high-level agent are fed to the low-level module, where the inferred canonical actions are mapped back to the world frame for execution. This structured yet flexible design leads to substantial improvements in sample efficiency and generalization of our method across unseen spatial arrangements, object appearances, and task compositions. In simulation, GSL trained with only 3 demonstrations per task outperforms baselines trained with 30 times more data by 15.5 percent on unseen tasks. In real-world experiments, GSL also surpasses the baseline trained with 10 times more data.

artificial intelligence, generalization, machine learning, (18 more...)

2510.21121

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-9-2025, 09:10:04 GMT

A Proof of Proposition

Now suppose we are given an unlabeled target sample with unknown label shift.

artificial intelligence, assumption, machine learning, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Vladimir Vovk, Ivan Petej, Valentina Fedorova

Large-scale probabilistic predictors with and without guarantees of validity

Neural Information Processing SystemsOct-2-2025, 12:13:23 GMT

This paper studies theoretically and empirically a method of turning machine-learning algorithms into probabilistic predictors that automatically enjoys a property of validity (perfect calibration) and is computationally efficient. The price to pay for perfect calibration is that these probabilistic predictors produce imprecise (in practice, almost precise for large data sets) probabilities. When these imprecise probabilities are merged into precise probabilities, the resulting predictors, while losing the theoretical property of perfect calibration, are consistently more accurate than the existing methods in empirical studies.

algorithm, isotonic regression, predictor, (13 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Paraguay > Asunción > Asunción (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Sinaga, Kristina P., Nair, Arjun S.

Calibration Meets Reality: Making Machine Learning Predictions Trustworthy

arXiv.org Artificial IntelligenceSep-30-2025

Post-hoc calibration methods are widely used to improve the reliability of probabilistic predictions from machine learning models. Despite their prevalence, a comprehensive theoretical understanding of these methods remains elusive, particularly regarding their performance across different datasets and model architectures. Input features play a crucial role in shaping model predictions and, consequently, their calibration. However, the interplay between feature quality and calibration performance has not been thoroughly investigated. In this work, we present a rigorous theoretical analysis of post-hoc calibration methods, focusing on Platt scaling and isotonic regression. We derive convergence guarantees, computational complexity bounds, and finite-sample performance metrics for these methods. Furthermore, we explore the impact of feature informativeness on calibration performance through controlled synthetic experiments. Our empirical evaluation spans a diverse set of real-world datasets and model architectures, demonstrating consistent improvements in calibration metrics across various scenarios. By examining calibration performance under varying feature conditions utilizing only informative features versus complete feature spaces including noise dimensions, we provide fundamental insights into the robustness and reliability of different calibration approaches. Our findings offer practical guidelines for selecting appropriate calibration methods based on dataset characteristics and computational constraints, bridging the gap between theoretical understanding and practical implementation in uncertainty quantification. Code and experimental data are available at: https://github.com/Ajwebdevs/calibration-analysis-experiments.

artificial intelligence, calibration, machine learning, (16 more...)

2509.23665

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Banking & Finance (1.00)
Health & Medicine > Diagnostic Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)