Goto

Collaborating Authors

 Uncertainty


Design Insights and Comparative Evaluation of a Hardware-Based Cooperative Perception Architecture for Lane Change Prediction

arXiv.org Artificial Intelligence

Traffic accidents remain a major global concern, with lane-change maneuvers recognized as one of the significant contributors to collision risk. Anticipating these maneuvers has become an important research focus, supporting both traffic safety and the safe integration of autonomous and assisted driving technologies. Over the past decade, numerous models have been developed for lane-change prediction. However, most existing works have been designed and validated using simulation environments or pre-recorded datasets. While these settings allow for benchmarking and controlled evaluation, they often rely on simplified assumptions about sensing, communication, and vehicle behavior that do not fully capture the complexity of real-world operation. Real-world deployments of lane-change prediction systems are relatively rare, and when they are reported, their practical challenges, limitations, and insights remain under-documented. To illustrate the setting more concretely, consider the left lane change scenario shown in Figure 1. The Ego Vehicle (EV) is driving in the left lane, while the Target Vehicle (TV) is moving in the right lane behind a Preceding Vehicle (PV). When the PV suddenly brakes, the TV must change lanes to avoid a collision.


Incomplete Data, Complete Dynamics: A Diffusion Approach

arXiv.org Artificial Intelligence

Learning physical dynamics from data is a fundamental challenge in machine learning and scientific modeling. Real-world observational data are inherently incomplete and irregularly sampled, posing significant challenges for existing data-driven approaches. In this work, we propose a principled diffusion-based framework for learning physical systems from incomplete training samples. To this end, our method strategically partitions each such sample into observed context and unobserved query components through a carefully designed splitting strategy, then trains a conditional diffusion model to reconstruct the missing query portions given available contexts. This formulation enables accurate imputation across arbitrary observation patterns without requiring complete data supervision. Specifically, we provide theoretical analysis demonstrating that our diffusion training paradigm on incomplete data achieves asymptotic convergence to the true complete generative process under mild regularity conditions. Empirically, we show that our method significantly outperforms existing baselines on synthetic and real-world physical dynamics benchmarks, including fluid flows and weather systems, with particularly strong performance in limited and irregular observation regimes. These results demonstrate the effectiveness of our theoretically principled approach for learning and imputing partially observed dynamics. Learning physical dynamics from observational data represents a cornerstone challenge in machine learning and scientific computing, with applications spanning weather forecasting (Conti, 2024; Zhang et al., 2025b), fluid dynamics (Wang et al., 2024; Brunton & Kutz, 2024), biological systems modeling (Qi et al., 2024; Goshisht, 2024), and beyond. Classical physics-based approaches require explicit specification of governing equations and boundary conditions, while data-driven methods offer the promise of discovering hidden dynamics directly from observations (Luo et al., 2025; Meng et al., 2025). However, a fundamental bottleneck persists: real-world observational data are inherently incomplete, irregularly sampled, and subject to various forms of missing information, making it difficult for existing approaches to learn accurate representations of the underlying dynamics.


Simultaneous estimation of contact position and tool shape with high-dimensional parameters using force measurements and particle filtering

arXiv.org Artificial Intelligence

Estimating the contact state between a grasped tool and the environment is essential for performing contact tasks such as assembly and object manipulation. Force signals are valuable for estimating the contact state, as they can be utilized even when the contact location is obscured by the tool. Previous studies proposed methods for estimating contact positions using force/torque signals; however, most methods require the geometry of the tool surface to be known. Although several studies have proposed methods that do not require the tool shape, these methods require considerable time for estimation or are limited to tools with low-dimensional shape parameters. Here, we propose a method for simultaneously estimating the contact position and tool shape, where the tool shape is represented by a grid, which is high-dimensional (more than 1000 dimensional). The proposed method uses a particle filter in which each particle has individual tool shape parameters, thereby to avoid directly handling a high-dimensional parameter space. The proposed method is evaluated through simulations and experiments using tools with curved shapes on a plane. Consequently, the proposed method can estimate the shape of the tool simultaneously with the contact positions, making the contact-position estimation more accurate.


$ฯƒ$-Maximal Ancestral Graphs

arXiv.org Artificial Intelligence

Maximal Ancestral Graphs (MAGs) provide an abstract representation of Directed Acyclic Graphs (DAGs) with latent (selection) variables. These graphical objects encode information about ancestral relations and d-separations of the DAGs they represent. This abstract representation has been used amongst others to prove the soundness and completeness of the FCI algorithm for causal discovery, and to derive a do-calculus for its output. One significant inherent limitation of MAGs is that they rule out the possibility of cyclic causal relationships. In this work, we address that limitation. We introduce and study a class of graphical objects that we coin ''$ฯƒ$-Maximal Ancestral Graphs'' (''$ฯƒ$-MAGs''). We show how these graphs provide an abstract representation of (possibly cyclic) Directed Graphs (DGs) with latent (selection) variables, analogously to how MAGs represent DAGs. We study the properties of these objects and provide a characterization of their Markov equivalence classes.


Emergent Risk Awareness in Rational Agents under Resource Constraints

arXiv.org Artificial Intelligence

Advanced reasoning models with agentic capabilities (AI agents) are deployed to interact with humans and to solve sequential decision-making problems under (approximate) utility functions and internal models. When such problems have resource or failure constraints where action sequences may be forcibly terminated once resources are exhausted, agents face implicit trade-offs that reshape their utility-driven (rational) behaviour. Additionally, since these agents are typically commissioned by a human principal to act on their behalf, asymmetries in constraint exposure can give rise to previously unanticipated misalignment between human objectives and agent incentives. We formalise this setting through a survival bandit framework, provide theoretical and empirical results that quantify the impact of survival-driven preference shifts, identify conditions under which misalignment emerges and propose mechanisms to mitigate the emergence of risk-seeking or risk-averse behaviours. As a result, this work aims to increase understanding and interpretability of emergent behaviours of AI agents operating under such survival pressure, and offer guidelines for safely deploying such AI systems in critical resource-limited environments.


Lidar-based Tracking of Traffic Participants with Sensor Nodes in Existing Urban Infrastructure

arXiv.org Artificial Intelligence

This paper presents a lidar-only state estimation and tracking framework, along with a roadside sensing unit for integration with existing urban infrastructure. Urban deployments demand scalable, real-time tracking solutions, yet traditional remote sensing remains costly and computationally intensive, especially under perceptually degraded conditions. Our sensor node couples a single lidar with an edge computing unit and runs a computationally efficient, GPU-free observer that simultaneously estimates object state, class, dimensions, and existence probability. The pipeline performs: (i) state updates via an extended Kalman filter, (ii) dimension estimation using a 1D grid-map/Bayesian update, (iii) class updates via a lookup table driven by the most probable footprint, and (iv) existence estimation from track age and bounding-box consistency. Experiments in dynamic urban-like scenes with diverse traffic participants demonstrate real-time performance and high precision: The complete end-to-end pipeline finishes within \SI{100}{\milli\second} for \SI{99.88}{\%} of messages, with an excellent detection rate. Robustness is further confirmed under simulated wind and sensor vibration. These results indicate that reliable, real-time roadside tracking is feasible on CPU-only edge hardware, enabling scalable, privacy-friendly deployments within existing city infrastructure. The framework integrates with existing poles, traffic lights, and buildings, reducing deployment costs and simplifying large-scale urban rollouts and maintenance efforts.


From Samples to Scenarios: A New Paradigm for Probabilistic Forecasting

arXiv.org Artificial Intelligence

Most state-of-the-art probabilistic time series forecasting models rely on sampling to represent future uncertainty. However, this paradigm suffers from inherent limitations, such as lacking explicit probabilities, inadequate coverage, and high computational costs. In this work, we introduce \textbf{Probabilistic Scenarios}, an alternative paradigm designed to address the limitations of sampling. It operates by directly producing a finite set of \{Scenario, Probability\} pairs, thus avoiding Monte Carlo-like approximation. To validate this paradigm, we propose \textbf{TimePrism}, a simple model composed of only three parallel linear layers. Surprisingly, TimePrism achieves 9 out of 10 state-of-the-art results across five benchmark datasets on two metrics. The effectiveness of our paradigm comes from a fundamental reframing of the learning objective. Instead of modeling an entire continuous probability space, the model learns to represent a set of plausible scenarios and corresponding probabilities. Our work demonstrates the potential of the Probabilistic Scenarios paradigm, opening a promising research direction in forecasting beyond sampling.


Consistent Estimation of Numerical Distributions under Local Differential Privacy by Wavelet Expansion

arXiv.org Artificial Intelligence

Distribution estimation under local differential privacy (LDP) is a fundamental and challenging task. Significant progresses have been made on categorical data. However, due to different evaluation metrics, these methods do not work well when transferred to numerical data. In particular, we need to prevent the probability mass from being misplaced far away. In this paper, we propose a new approach that express the sample distribution using wavelet expansions. The coefficients of wavelet series are estimated under LDP. Our method prioritizes the estimation of low-order coefficients, in order to ensure accurate estimation at macroscopic level. Therefore, the probability mass is prevented from being misplaced too far away from its ground truth. We establish theoretical guarantees for our methods. Experiments show that our wavelet expansion method significantly outperforms existing solutions under Wasserstein and KS distances.


GuessingGame: Measuring the Informativeness of Open-Ended Questions in Large Language Models

arXiv.org Artificial Intelligence

We introduce GuessingGame, a protocol for evaluating large language models (LLMs) as strategic question-askers in open-ended, open-domain settings. A Guesser LLM identifies a hidden object by posing free-form questions to an Oracle without predefined choices or candidate lists. To measure question quality, we propose two information gain (IG) metrics: a Bayesian method that tracks belief updates over semantic concepts using LLM-scored relevance, and an entropy-based method that filters candidates via ConceptNet. Both metrics are model-agnostic and support post hoc analysis. Across 858 games with multiple models and prompting strategies, higher IG strongly predicts efficiency: a one-standard-deviation IG increase reduces expected game length by 43\%. Prompting constraints guided by IG, such as enforcing question diversity, enable weaker models to significantly improve performance. These results show that question-asking in LLMs is both measurable and improvable, and crucial for interactive reasoning.


What Does Your Benchmark Really Measure? A Framework for Robust Inference of AI Capabilities

arXiv.org Artificial Intelligence

Evaluations of generative models on benchmark data are now ubiquitous, and their outcomes critically shape public and scientific expectations of AI's capabilities. Yet growing skepticism surrounds their reliability. How can we know that a reported accuracy genuinely reflects a model's true performance? Evaluations are often presented as simple measurements, but in reality they are inferences: to treat benchmark scores as evidence of capability is already to assume a theory of what capability is and how it manifests in a test. We make this step explicit by proposing a principled framework for evaluation as inference: begin from a theory of capability, and then derive methods for estimating it. This perspective, familiar in fields such as psychometrics, has not yet become commonplace in AI evaluation. As a proof of concept, we address a central challenge that undermines reliability: sensitivity to perturbations. After formulating a model of ability, we introduce methods that infer ability while accounting for uncertainty from sensitivity and finite samples, including an adaptive algorithm that significantly reduces sample complexity. Together, these contributions lay the groundwork for more reliable and trustworthy estimates of AI capabilities as measured through benchmarks.