Goto

Collaborating Authors

 interpretation


'Probably' doesn't mean the same thing to your AI as it does to you

AIHub

'Probably' doesn't mean the same thing to your AI as it does to you When a human says an event is "probable" or "likely," people generally have a shared, if fuzzy, understanding of what that means. But when an AI chatbot like ChatGPT uses the same word, it's not assessing the odds the way we do, my colleagues and I found. We recently published a study in the journal NPJ Complexity that suggests that, while large language model AIs excel at conversation, they often fail to align with humans when communicating uncertainty . The research focused on words of estimative probability, which include terms like "maybe," "probably" and "almost certain." By comparing how AI models and humans map these words to numerical percentages, we uncovered significant gaps between humans and large language models.


MCAnalysis: An Open-Source Package for Preprocessing, Modelling, and Visualisation of Menstrual Cycle Effects in Digital Health Data

Delray, Kyra, Lewis, Glyn, Grace, Bola, Hayes, Joseph, Evans, Robin

arXiv.org Machine Learning

Digital Health Technologies (DHTs) including consumer wearable devices and digital health applications offer an opportunity for continuous, large-scale data collection. Wearables give insight into physiological biomarkers that help us understand the human body, through passive data collection. Such data can be collected at a regularity that would be impossible otherwise. Digital health applications provide the chance to collect diverse types of data from clinically validated surveys, GPS, and contextual inputs. This combination has the ability to make profound advances in our understanding of the factors that affect individuals on a personal and population level [Grace et al., 2025]. One of these factors is the menstrual cycle. Particularly because of its inter-individual variability, studying it requires large sample sizes, and to truly grasp its effects on the human body, it needs to be observed on a near-daily scale [Bull et al., 2019].


The man who ruined mathematics

New Scientist

Gödel's seminal work directly contradicted one of the great minds of mathematics and limited the field forever Kurt Gödel, the man who ruined mathematics, was one of the most important thinkers of the 20th century. He was born in 1906, smack-bang in the middle of the greatest crisis that maths has ever known. Just a few decades later, he would help resolve this turmoil, but in doing so doom mathematicians to a smaller world than the one that came before. Mathematics, as an intellectual framework, is incredibly powerful. The entire point is taking one set of logical ideas and using them to build another, making maths the closest thing we have to a cognitive perpetual-motion machine - there is always a new mathematical idea lurking across the horizon, and we just need to assemble the steps to get there.


From Ground Truth to Measurement: A Statistical Framework for Human Labeling

Chew, Robert, Eckman, Stephanie, Kern, Christoph, Kreuter, Frauke

arXiv.org Machine Learning

Supervised machine learning assumes that labeled data provide accurate measurements of the concepts models are meant to learn. Yet in practice, human labeling introduces systematic variation arising from ambiguous items, divergent interpretations, and simple mistakes. Machine learning research commonly treats all disagreement as noise, which obscures these distinctions and limits our understanding of what models actually learn. This paper reframes annotation as a measurement process and introduces a statistical framework for decomposing labeling outcomes into interpretable sources of variation: instance difficulty, annotator bias, situational noise, and relational alignment. The framework extends classical measurement-error models to accommodate both shared and individualized notions of truth, reflecting traditional and human label variation interpretations of error, and provides a diagnostic for assessing which regime better characterizes a given task. Applying the proposed model to a multi-annotator natural language inference dataset, we find empirical evidence for all four theorized components and demonstrate the effectiveness of our approach. We conclude with implications for data-centric machine learning and outline how this approach can guide the development of a more systematic science of labeling.


Noisy Nonreciprocal Pairwise Comparisons: Scale Variation, Noise Calibration, and Admissible Ranking Regions

Magnot, Jean-Pierre

arXiv.org Machine Learning

Pairwise comparisons are widely used in decision analysis, preference modeling, and evaluation problems. In many practical situations, the observed comparison matrix is not reciprocal. This lack of reciprocity is often treated as a defect to be corrected immediately. In this article, we adopt a different point of view: part of the nonreciprocity may reflect a genuine variation in the evaluation scale, while another part is due to random perturbations. We introduce an additive model in which the unknown underlying comparison matrix is consistent but not necessarily reciprocal. The reciprocal component carries the global ranking information, whereas the symmetric component describes possible scale variation. Around this structured matrix, we add a random perturbation and show how to estimate the noise level, assess whether the scale variation remains moderate, and assign probabilities to admissible ranking regions in the sense of strict ranking by pairwise comparisons. We also compare this approach with the brutal projection onto reciprocal matrices, which suppresses all symmetric information at once. The Gaussian perturbation model is used here not because human decisions are exactly Gaussian, but because observed judgment errors often result from the accumulation of many small effects. In such a context, the central limit principle provides a natural heuristic justification for Gaussian noise. This makes it possible to derive explicit estimators and probability assessments while keeping the model interpretable for decision problems.


Comprehensive Description of Uncertainty in Measurement for Representation and Propagation with Scalable Precision

Darijani, Ali, Beyerer, Jürgen, Nasrollah, Zahra Sadat Hajseyed, Hoffmann, Luisa, Heizmann, Michael

arXiv.org Machine Learning

Probability theory has become the predominant framework for quantifying uncertainty across scientific and engineering disciplines, with a particular focus on measurement and control systems. However, the widespread reliance on simple Gaussian assumptions--particularly in control theory, manufacturing, and measurement systems--can result in incomplete representations and multistage lossy approximations of complex phenomena, including inaccurate propagation of uncertainty through multi stage processes. This work proposes a comprehensive yet computationally tractable framework for representing and propagating quantitative attributes arising in measurement systems using Probability Density Functions (PDFs). Recognizing the constraints imposed by finite memory in software systems, we advocate for the use of Gaussian Mixture Models (GMMs), a principled extension of the familiar Gaussian framework, as they are universal approximators of PDFs whose complexity can be tuned to trade off approximation accuracy against memory and computation. From both mathematical and computational perspectives, GMMs enable high performance and, in many cases, closed form solutions of essential operations in control and measurement. The paper presents practical applications within manufacturing and measurement contexts especially circular factory, demonstrating how the GMMs framework supports accurate representation and propagation of measurement uncertainty and offers improved accuracy--compared to the traditional Gaussian framework--while keeping the computations tractable.


Active Inference for Physical AI Agents -- An Engineering Perspective

de Vries, Bert

arXiv.org Machine Learning

Physical AI agents, such as robots and other embodied systems operating under tight and fluctuating resource constraints, remain far less capable than biological agents in open-ended real-world environments. This paper argues that Active Inference (AIF), grounded in the Free Energy Principle, offers a principled foundation for closing that gap. We develop this argument from first principles, following a chain from probability theory through Bayesian machine learning and variational inference to active inference and reactive message passing. From the FEP perspective, systems that maintain their structural and functional integrity over time can, under suitable assumptions, be described as minimizing variational free energy (VFE), and AIF operationalizes this by unifying perception, learning, planning, and control within a single computational objective. We show that VFE minimization is naturally realized by reactive message passing on factor graphs, where inference emerges from local, parallel computations. This realization is well matched to the constraints of physical operation, including hard deadlines, asynchronous data, fluctuating power budgets, and changing environments. Because reactive message passing is event-driven, interruptible, and locally adaptable, performance degrades gracefully under reduced resources while model structure can adjust online. We further show that, under suitable coupling and coarse-graining conditions, coupled AIF agents can be described as higher-level AIF agents, yielding a homogeneous architecture based on the same message-passing primitive across scales. Our contribution is not empirical benchmarking, but a clear theoretical and architectural case for the engineering community.


Uncertainty-Aware Attention for Reliable Interpretation and Prediction

Neural Information Processing Systems

Attention mechanism is effective in both focusing the deep learning models on relevant features and interpreting them. However, attentions may be unreliable since the networks that generate them are often trained in a weakly-supervised manner. To overcome this limitation, we introduce the notion of input-dependent uncertainty to the attention mechanism, such that it generates attention for each feature with varying degrees of noise based on the given input, to learn larger variance on instances it is uncertain about. We learn this Uncertainty-aware Attention (UA) mechanism using variational inference, and validate it on various risk prediction tasks from electronic health records on which our model significantly outperforms existing attention models. The analysis of the learned attentions shows that our model generates attentions that comply with clinicians' interpretation, and provide richer interpretation via learned variance. Further evaluation of both the accuracy of the uncertainty calibration and the prediction performance with I don't know'' decision show that UA yields networks with high reliability as well.


Studying multiplicity: an interview with Prakhar Ganesh

AIHub

In this interview series, we're meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. We sat down with Prakhar Ganesh to learn about his work on responsible AI, which is focussed on the concept of multiplicity. We found out more about some of the projects he's been involved in, his future plans, and how he got into the field. Could you start with a quick introduction to yourself, where you're studying, and the broad topic of your research? My name is Prakhar Ganesh. I'm also affiliated with Mila, which is a research institute in Montreal. My supervisor is Professor Golnoosh Farnadi.