belief update
Approximate inference of marginals using the IBIA framework
Exact inference of marginals in probabilistic graphical models (PGM) is known to be intractable, necessitating the use of approximate methods. Most of the existing variational techniques perform iterative message passing in loopy graphs which is slow to converge for many benchmarks. In this paper, we propose a new algorithm for marginal inference that is based on the incremental build-infer-approximate (IBIA) paradigm. Our algorithm converts the PGM into a sequence of linked clique tree forests (SLCTF) with bounded clique sizes, and then uses a heuristic belief update algorithm to infer the marginals. For the special case of Bayesian networks, we show that if the incremental build step in IBIA uses the topological order of variables then (a) the prior marginals are consistent in all CTFs in the SLCTF and (b) the posterior marginals are consistent once all evidence variables are added to the SLCTF. In our approach, the belief propagation step is non-iterative and the accuracy-complexity trade-off is controlled using user-defined clique size bounds. Results for several benchmark sets from recent UAI competitions show that our method gives either better or comparable accuracy than existing variational and sampling based methods, with smaller runtimes.
Motion Planning Under Temporal Logic Specifications In Semantically Unknown Environments
Taheri, Azizollah, Aksaray, Derya
This paper addresses a motion planning problem to achieve spatio-temporal-logical tasks, expressed by syntactically co-safe linear temporal logic specifications (scLTL\next), in uncertain environments. Here, the uncertainty is modeled as some probabilistic knowledge on the semantic labels of the environment. For example, the task is "first go to region 1, then go to region 2"; however, the exact locations of regions 1 and 2 are not known a priori, instead a probabilistic belief is available. We propose a novel automata-theoretic approach, where a special product automaton is constructed to capture the uncertainty related to semantic labels, and a reward function is designed for each edge of this product automaton. The proposed algorithm utilizes value iteration for online replanning. We show some theoretical results and present some simulations/experiments to demonstrate the efficacy of the proposed approach.
Planning Ahead with RSA: Efficient Signalling in Dynamic Environments by Projecting User Awareness across Future Timesteps
Das, Anwesha, Duff, John, Hoffmann, Jörg, Demberg, Vera
Adaptive agent design offers a way to improve human-AI collaboration on time-sensitive tasks in rapidly changing environments. In such cases, to ensure the human maintains an accurate understanding of critical task elements, an assistive agent must not only identify the highest priority information but also estimate how and when this information can be communicated most effectively, given that human attention represents a zero-sum cognitive resource where focus on one message diminishes awareness of other or upcoming information. We introduce a theoretical framework for adaptive signalling which meets these challenges by using principles of rational communication, formalised as Bayesian reference resolution using the Rational Speech Act (RSA) modelling framework, to plan a sequence of messages which optimise timely alignment between user belief and a dynamic environment. The agent adapts message specificity and timing to the particulars of a user and scenario based on projections of how prior-guided interpretation of messages will influence attention to the interface and subsequent belief update, across several timesteps out to a fixed horizon. In a comparison to baseline methods, we show that this effectiveness depends crucially on combining multi-step planning with a realistic model of user awareness. As the first application of RSA for communication in a dynamic environment, and for human-AI interaction in general, we establish theoretical foundations for pragmatic communication in human-agent teams, highlighting how insights from cognitive science can be capitalised to inform the design of assistive agents.
DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic
Wu, Yuheng, Xie, Jianwen, Zhang, Denghui, Xu, Zhaozhuo
Theory-of-Mind (ToM) tasks pose a unique challenge for large language models (LLMs), which often lack the capability for dynamic logical reasoning. In this work, we propose DEL-ToM, a framework that improves verifiable ToM reasoning through inference-time scaling rather than architectural changes. Our approach decomposes ToM tasks into a sequence of belief updates grounded in Dynamic Epistemic Logic (DEL), enabling structured and verifiable dynamic logical reasoning. We use data generated automatically via a DEL simulator to train a verifier, which we call the Process Belief Model (PBM), to score each belief update step. During inference, the PBM evaluates candidate belief traces from the LLM and selects the highest-scoring one. This allows LLMs to allocate extra inference-time compute to yield more transparent reasoning. Experiments across model scales and benchmarks show that DEL-ToM consistently improves performance, demonstrating that verifiable belief supervision significantly enhances LLMs' ToM capabilities without retraining. Code is available at https://github.com/joel-wu/DEL-ToM.
Hybrid quantum-classical algorithm for near-optimal planning in POMDPs
Cunha, Gilberto, Ramôa, Alexandra, Sequeira, André, de Oliveira, Michael, Barbosa, Luís
Reinforcement learning (RL) provides a principled framework for decision-making in partially observable environments, which can be modeled as Markov decision processes and compactly represented through dynamic decision Bayesian networks. Recent advances demonstrate that inference on sparse Bayesian networks can be accelerated using quantum rejection sampling combined with amplitude amplification, leading to a computational speedup in estimating acceptance probabilities.\\ Building on this result, we introduce Quantum Bayesian Reinforcement Learning (QBRL), a hybrid quantum-classical look-ahead algorithm for model-based RL in partially observable environments. We present a rigorous, oracle-free time complexity analysis under fault-tolerant assumptions for the quantum device. Unlike standard treatments that assume a black-box oracle, we explicitly specify the inference process, allowing our bounds to more accurately reflect the true computational cost. We show that, for environments whose dynamics form a sparse Bayesian network, horizon-based near-optimal planning can be achieved sub-quadratically faster through quantum-enhanced belief updates. Furthermore, we present numerical experiments benchmarking QBRL against its classical counterpart on simple yet illustrative decision-making tasks. Our results offer a detailed analysis of how the quantum computational advantage translates into decision-making performance, highlighting that the magnitude of the advantage can vary significantly across different deployment settings.
Doing Experiments and Revising Rules with Natural Language and Probabilistic Reasoning
We give a model of how to infer natural language rules by doing experiments. We conduct a human-model comparison on aZendo-style task, finding that a critical ingredient for modeling the human data is toassume that humans also consider fuzzy, probabilistic rules, in addition to assumingthat humans perform approximately-Bayesian belief updates. We also comparewith recent algorithms for using LLMs to generate and revise hypotheses, findingthat our online inference method yields higher accuracy at recovering the trueunderlying rule, and provides better support for designing optimal experiments.
Off-Policy Evaluation for Sequential Persuasion Process with Unobserved Confounding
S., Nishanth Venkatesh, Bang, Heeseung, Malikopoulos, Andreas A.
-- In this paper, we expand the Bayesian persuasion framework to account for unobserved confounding variables in sender-receiver interactions. While traditional models typically assume that belief updates follow Bayesian principles, real-world scenarios often involve hidden variables that impact the receiver's belief formation and decision-making. Crucially, the receiver's belief update is affected by an unobserved confounding variable. By reformulating this scenario as a Partially Observable Markov Decision Process (POMDP), we capture the sender's incomplete information regarding both the dynamics of the receiver's beliefs and the unobserved confounder . We prove that finding an optimal observation-based policy in this POMDP is equivalent to solving for an optimal signaling strategy in the original persuasion framework. Furthermore, we demonstrate how this reformulation facilitates the application of proximal learning for off-policy evaluation (OPE) in the persuasion process. This advancement enables the sender to evaluate alternative signaling strategies using only observational data from a behavioral policy, thus eliminating the necessity for costly new experiments. Strategic information sharing plays a critical role in economic interactions, policy design, and multi-agent systems [1]-[3]. Bayesian persuasion was first introduced by Ka-menica and Gentzkow [4] as a powerful framework for analyzing how a sender can strategically reveal information to influence a receiver's decisions. In the standard setting, a sender commits to an information disclosure policy before observing the state of the world, and the receiver, after observing the sender's message, forms posterior beliefs and takes an action that affects both the sender's and the receiver's utilities. Despite its theoretical elegance, Bayesian persuasion rests on assumptions that may not hold in practical settings. First, the framework presupposes that the sender possesses complete information about the receiver, including their observation process and all features that influence their decision-making (including utility functions).