Bayesian Inference
A Generalized Bias-Variance Decomposition for Bregman Divergences
The bias-variance decomposition is a central result in statistics and machine learning, but is typically presented only for the squared error. We present a generalization of the bias-variance decomposition where the prediction error is a Bregman divergence, which is relevant to maximum likelihood estimation with exponential families. While the result is already known, there was not previously a clear, standalone derivation, so we provide one for pedagogical purposes. A version of this note previously appeared on the author's personal website without context. Here we provide additional discussion and references to the relevant prior literature.
A metrological framework for uncertainty evaluation in machine learning classification models
Bilson, Samuel, Cox, Maurice, Pustogvar, Anna, Thompson, Andrew
Machine learning (ML) classification models are increasingly being used in a wide range of applications where it is important that predictions are accompanied by uncertainties, including in climate and earth observation, medical diagnosis and bioaerosol monitoring. The output of an ML classification model is a type of categorical variable known as a nominal property in the International Vocabulary of Metrology (VIM). However, concepts related to uncertainty evaluation for nominal properties are not defined in the VIM, nor is such evaluation addressed by the Guide to the Expression of Uncertainty in Measurement (GUM). In this paper we propose a metrological conceptual uncertainty evaluation framework for nominal properties. This framework is based on probability mass functions and summary statistics thereof, and it is applicable to ML classification. We also illustrate its use in the context of two applications that exemplify the issues and have significant societal impact, namely, climate and earth observation and medical diagnosis. Our framework would enable an extension of the GUM to uncertainty for nominal properties, which would make both applicable to ML classification models.
Trends in Motion Prediction Toward Deployable and Generalizable Autonomy: A Revisit and Perspectives
Wang, Letian, Lavoie, Marc-Antoine, Papais, Sandro, Nisar, Barza, Chen, Yuxiao, Ding, Wenhao, Ivanovic, Boris, Shao, Hao, Abuduweili, Abulikemu, Cook, Evan, Zhou, Yang, Karkus, Peter, Li, Jiachen, Liu, Changliu, Pavone, Marco, Waslander, Steven
Motion prediction, recently popularized under the term world models, refers to anticipating the future states of agents or the future evolution of a scene, which is rooted in human cognition to bridge perception and decision-making, enabling us to anticipate, adapt, and act within an ever-changing world. It lies at the core of intelligent autonomous systems, such as robotics and self-driving cars, to safely operate in dynamic and human-robot-mixed environments, and also informs broader time-series challenges. With advances in methods, representations, and datasets, the field has seen rapid progress, reflected in rapidly updated benchmark performance. However, when state-of-the-art methods are deployed in the real world, they are often found to struggle to generalize to open-world settings and fall short of deployment standards. This reveals a gap between reality and benchmarks, which are often idealized or ill-posed, and fail to capture real-world complexity. To address the pressing need for problem settings that better reflect real-world challenges and guide future research, this paper focuses on revisiting the generalization and applicability of motion prediction models, with an emphasis on robotics, autonomous driving, and human motion applications. We first provide a comprehensive taxonomy of motion prediction methods, covering representations, modelling methods, application domains, and evaluation protocols. We then revisit two fundamental problems: 1) how to push motion prediction models to be deployable to realistic deployment standards, where motion prediction does not act in a vacuum, but functions as one module of closed-loop autonomy stacks - it takes input from the localization and perception, and informs downstream planning and control.
Back to the Future: The Role of Past and Future Context Predictability in Incremental Language Production
Upadhye, Shiva, Futrell, Richard
Contextual predictability shapes both the form and choice of words in online language production. The effects of the predictability of a word given its previous context are generally well-understood in both production and comprehension, but studies of naturalistic production have also revealed a poorly-understood backward predictability effect of a word given its future context, which may be related to future planning. Here, in two studies of naturalistic speech corpora, we investigate backward predictability effects using improved measures and more powerful language models, introducing a new principled and conceptually motivated information-theoretic predictability measure that integrates predictability from both the future and the past context. Our first study revisits classic predictability effects on word duration. Our second study investigates substitution errors within a generative framework that independently models the effects of lexical, contextual, and communicative factors on word choice, while predicting the actual words that surface as speech errors. We find that our proposed conceptually-motivated alternative to backward predictability yields qualitatively similar effects across both studies. Through a fine-grained analysis of substitution errors, we further show that different kinds of errors are suggestive of how speakers prioritize form, meaning, and context-based information during lexical planning. Together, these findings illuminate the functional roles of past and future context in how speakers encode and choose words, offering a bridge between contextual predictability effects and the mechanisms of sentence planning.
Veli: Unsupervised Method and Unified Benchmark for Low-Cost Air Quality Sensor Correction
Dalbah, Yahia, Worring, Marcel, Hsu, Yen-Chia
Urban air pollution is a major health crisis causing millions of premature deaths annually, underscoring the urgent need for accurate and scalable monitoring of air quality (AQ). While low-cost sensors (LCS) offer a scalable alternative to expensive reference-grade stations, their readings are affected by drift, calibration errors, and environmental interference. To address these challenges, we introduce Veli (Reference-free Variational Estimation via Latent Inference), an unsupervised Bayesian model that leverages variational inference to correct LCS readings without requiring co-location with reference stations, eliminating a major deployment barrier. Specifically, Veli constructs a disentangled representation of the LCS readings, effectively separating the true pollutant reading from the sensor noise. To build our model and address the lack of standardized benchmarks in AQ monitoring, we also introduce the Air Quality Sensor Data Repository (AQ-SDR). AQ-SDR is the largest AQ sensor benchmark to date, with readings from 23,737 LCS and reference stations across multiple regions. Veli demonstrates strong generalization across both in-distribution and out-of-distribution settings, effectively handling sensor drift and erratic sensor behavior. Code for model and dataset will be made public when this paper is published.
Misaligned by Design: Incentive Failures in Machine Learning
Autor, David, Caplin, Andrew, Martin, Daniel, Marx, Philip
The cost of error in many high-stakes settings is asymmetric: misdiagnosing pneumonia when absent is an inconvenience, but failing to detect it when present can be life-threatening. Because of this, artificial intelligence (AI) models used to assist such decisions are frequently trained with asymmetric loss functions that incorporate human decision-makers' trade-offs between false positives and false negatives. In two focal applications, we show that this standard alignment practice can backfire. In both cases, it would be better to train the machine learning model with a loss function that ignores the human's objective and then adjust predictions ex post according to that objective. We rationalize this result using an economic model of incentive design with endogenous information acquisition. The key insight from our theoretical framework is that machine classifiers perform not one but two incentivized tasks: choosing how to classify and learning how to classify. We show that while the adjustments engineers use correctly incentivize choosing, they can simultaneously reduce the incentives to learn. Our formal treatment of the problem reveals that methods embraced for their intuitive appeal can in fact misalign human and machine objectives in predictable ways.
MULTI-LF: A Continuous Learning Framework for Real-Time Malicious Traffic Detection in Multi-Environment Networks
Rustam, Furqan, Obaidat, Islam, Jurcut, Anca Delia
Multi-environment (M-En) networks integrate diverse traffic sources, including Internet of Things (IoT) and traditional computing systems, creating complex and evolving conditions for malicious traffic detection. Existing machine learning (ML)-based approaches, typically trained on static single-domain datasets, often fail to generalize across heterogeneous network environments. To address this gap, we develop a realistic Docker-NS3-based testbed that emulates both IoT and traditional traffic conditions, enabling the generation and capture of live, labeled network flows. The resulting M-En Dataset combines this traffic with curated public PCAP traces to provide comprehensive coverage of benign and malicious behaviors. Building on this foundation, we propose Multi-LF, a real-time continuous learning framework that combines a lightweight model (M1) for rapid detection with a deeper model (M2) for high-confidence refinement and adaptation. A confidence-based coordination mechanism enhances efficiency without compromising accuracy, while weight interpolation mitigates catastrophic forgetting during continuous updates. Features extracted at 1-second intervals capture fine-grained temporal patterns, enabling early recognition of evolving attack behaviors. Implemented and evaluated within the Docker-NS3 testbed on live traffic, Multi-LF achieves an accuracy of 0.999 while requiring human intervention for only 0.0026 percent of packets, demonstrating its effectiveness and practicality for real-time malicious traffic detection in heterogeneous network environments.
Emergence of Goal-Directed Behaviors via Active Inference with Self-Prior
Kim, Dongmin, Kanazawa, Hoshinori, Yoshida, Naoto, Kuniyoshi, Yasuo
Infants often exhibit goal-directed behaviors, such as reaching for a sensory stimulus, even when no external reward criterion is provided. These intrinsically motivated behaviors facilitate spontaneous exploration and learning of the body and environment during early developmental stages. Although computational modeling can offer insight into the mechanisms underlying such behaviors, many existing studies on intrinsic motivation focus primarily on how exploration contributes to acquiring external rewards. In this paper, we propose a novel density model for an agent's own multimodal sensory experiences, called the "self-prior," and investigate whether it can autonomously induce goal-directed behavior. Integrated within an active inference framework based on the free energy principle, the self-prior generates behavioral references purely from an intrinsic process that minimizes mismatches between average past sensory experiences and current observations. This mechanism is also analogous to the acquisition and utilization of a body schema through continuous interaction with the environment. We examine this approach in a simulated environment and confirm that the agent spontaneously reaches toward a tactile stimulus. Our study implements intrinsically motivated behavior shaped by the agent's own sensory experiences, demonstrating the spontaneous emergence of intentional behavior during early development.
How Artificial Intelligence Leads to Knowledge Why: An Inquiry Inspired by Aristotle's Posterior Analytics
Eelink, Guus, Rückschloß, Kilian, Weitkämper, Felix
Bayesian networks and causal models provide frameworks for handling queries about external interventions and counterfactuals, enabling tasks that go beyond what probability distributions alone can address. While these formalisms are often informally described as capturing causal knowledge, there is a lack of a formal theory characterizing the type of knowledge required to predict the effects of external interventions. This work introduces the theoretical framework of causal systems to clarify Aristotle's distinction between knowledge that and knowledge why within artificial intelligence. By interpreting existing artificial intelligence technologies as causal systems, it investigates the corresponding types of knowledge. Furthermore, it argues that predicting the effects of external interventions is feasible only with knowledge why, providing a more precise understanding of the knowledge necessary for such tasks.
Robust Experimental Design via Generalised Bayesian Inference
Barlas, Yasir Zubayr, Sloman, Sabina J., Kaski, Samuel
Bayesian optimal experimental design is a principled framework for conducting experiments that leverages Bayesian inference to quantify how much information one can expect to gain from selecting a certain design. However, accurate Bayesian inference relies on the assumption that one's statistical model of the data-generating process is correctly specified. If this assumption is violated, Bayesian methods can lead to poor inference and estimates of information gain. Generalised Bayesian (or Gibbs) inference is a more robust probabilistic inference framework that replaces the likelihood in the Bayesian update by a suitable loss function. In this work, we present Generalised Bayesian Optimal Experimental Design (GBOED), an extension of Gibbs inference to the experimental design setting which achieves robustness in both design and inference. Using an extended information-theoretic framework, we derive a new acquisition function, the Gibbs expected information gain (Gibbs EIG). Our empirical results demonstrate that GBOED enhances robustness to outliers and incorrect assumptions about the outcome noise distribution.