Markov Models
Information Compression in Dynamic Games
Tang, Dengwang, Subramanian, Vijay, Teneketzis, Demosthenis
One of the reasons why stochastic dynamic games with an underlying dynamic system are challenging is since strategic players have access to enormous amount of information which leads to the use of extremely complex strategies at equilibrium. One approach to resolve this challenge is to simplify players' strategies by identifying appropriate compression of information maps so that the players can make decisions solely based on the compressed version of information, called the information state. For finite dynamic games with asymmetric information, inspired by the notion of information state for single-agent control problems, we propose two notions of information states, namely mutually sufficient information (MSI) and unilaterally sufficient information (USI). Both these information states are obtained with information compression maps independent of the strategy profile. We show that Bayes-Nash Equilibria (BNE) and Sequential Equilibria (SE) exist when all players use MSI-based strategies. We prove that when all players employ USI-based strategies the resulting sets of BNE and SE payoff profiles are the same as the sets of BNE and SE payoff profiles resulting when all players use full information-based strategies. We prove that when all players use USI-based strategies the resulting set of weak Perfect Bayesian Equilibrium (wPBE) payoff profiles can be a proper subset of all wPBE payoff profiles. We identify MSI and USI in specific models of dynamic games in the literature. We end by presenting an open problem: Do there exist strategy-dependent information compression maps that guarantee the existence of at least one equilibrium or maintain all equilibria that exist under perfect recall? We show, by a counterexample, that a well-known strategy-dependent information compression map used in the literature does not possess any of the properties of MSI or USI.
Variable-Agnostic Causal Exploration for Reinforcement Learning
Nguyen, Minh Hoang, Le, Hung, Venkatesh, Svetha
Modern reinforcement learning (RL) struggles to capture real-world cause-and-effect dynamics, leading to inefficient exploration due to extensive trial-and-error actions. While recent efforts to improve agent exploration have leveraged causal discovery, they often make unrealistic assumptions of causal variables in the environments. In this paper, we introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL), incorporating causal relationships to drive exploration in RL without specifying environmental causal variables. Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms. Subsequently, it constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion. This can be leveraged to generate intrinsic rewards or establish a hierarchy of subgoals to enhance exploration efficiency. Experimental results showcase a significant improvement in agent performance in grid-world, 2d games and robotic domains, particularly in scenarios with sparse rewards and noisy actions, such as the notorious Noisy-TV environments.
Subequivariant Reinforcement Learning in 3D Multi-Entity Physical Environments
Chen, Runfa, Wang, Ling, Du, Yu, Xue, Tianrui, Sun, Fuchun, Zhang, Jianwei, Huang, Wenbing
Learning policies for multi-entity systems in 3D environments is far more complicated against single-entity scenarios, due to the exponential expansion of the global state space as the number of entities increases. One potential solution of alleviating the exponential complexity is dividing the global space into independent local views that are invariant to transformations including translations and rotations. To this end, this paper proposes Subequivariant Hierarchical Neural Networks (SHNN) to facilitate multi-entity policy learning. In particular, SHNN first dynamically decouples the global space into local entity-level graphs via task assignment. Second, it leverages subequivariant message passing over the local entity-level graphs to devise local reference frames, remarkably compressing the representation redundancy, particularly in gravity-affected environments. Furthermore, to overcome the limitations of existing benchmarks in capturing the subtleties of multi-entity systems under the Euclidean symmetry, we propose the Multi-entity Benchmark (MEBEN), a new suite of environments tailored for exploring a wide range of multi-entity reinforcement learning. Extensive experiments demonstrate significant advancements of SHNN on the proposed benchmarks compared to existing methods. Comprehensive ablations are conducted to verify the indispensability of task assignment and subequivariance.
A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning
Almuzairee, Abdulaziz, Hansen, Nicklas, Christensen, Henrik I.
Q-learning algorithms are appealing for real-world applications due to their data-efficiency, but they are very prone to overfitting and training instabilities when trained from visual observations. Prior work, namely SVEA, finds that selective application of data augmentation can improve the visual generalization of RL agents without destabilizing training. We revisit its recipe for data augmentation, and find an assumption that limits its effectiveness to augmentations of a photometric nature. Addressing these limitations, we propose a generalized recipe, SADA, that works with wider varieties of augmentations. We benchmark its effectiveness on DMC-GB2 - our proposed extension of the popular DMControl Generalization Benchmark - as well as tasks from Meta-World and the Distracting Control Suite, and find that our method, SADA, greatly improves training stability and generalization of RL agents across a diverse set of augmentations. For visualizations, code and benchmark: see https://aalmuzairee.github.io/SADA/
Satisficing Exploration for Deep Reinforcement Learning
Arumugam, Dilip, Kumar, Saurabh, Gummadi, Ramki, Van Roy, Benjamin
A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world, however, attaining optimal performance may in fact be an entirely intractable endeavor and an agent may seldom find itself in a position to complete the requisite exploration for identifying an optimal policy. Recent work has leveraged tools from information theory to design agents that deliberately forgo optimal solutions in favor of sufficiently-satisfying or satisficing solutions, obtained through lossy compression. Notably, such agents may employ fundamentally different exploratory decisions to learn satisficing behaviors more efficiently than optimal ones that are more data intensive. While supported by a rigorous corroborating theory, the underlying algorithm relies on model-based planning, drastically limiting the compatibility of these ideas with function approximation and high-dimensional observations. In this work, we remedy this issue by extending an agent that directly represents uncertainty over the optimal value function allowing it to both bypass the need for model-based planning and to learn satisficing policies. We provide simple yet illustrative experiments that demonstrate how our algorithm enables deep reinforcement-learning agents to achieve satisficing behaviors. In keeping with previous work on this setting for multi-armed bandits, we additionally find that our algorithm is capable of synthesizing optimal behaviors, when feasible, more efficiently than its non-information-theoretic counterpart.
Semi-Supervised Generative Models for Disease Trajectories: A Case Study on Systemic Sclerosis
Trottet, Cรฉcile, Schรผrch, Manuel, Allam, Ahmed, Barua, Imon, Petelytska, Liubov, Distler, Oliver, Hoffmann-Vold, Anna-Maria, Krauthammer, Michael, collaborators, the EUSTAR
We propose a deep generative approach using latent temporal processes for modeling and holistically analyzing complex disease trajectories, with a particular focus on Systemic Sclerosis (SSc). We aim to learn temporal latent representations of the underlying generative process that explain the observed patient disease trajectories in an interpretable and comprehensive way. To enhance the interpretability of these latent temporal processes, we develop a semi-supervised approach for disentangling the latent space using established medical knowledge. By combining the generative approach with medical definitions of different characteristics of SSc, we facilitate the discovery of new aspects of the disease. We show that the learned temporal latent processes can be utilized for further data analysis and clinical hypothesis testing, including finding similar patients and clustering SSc patient trajectories into novel sub-types. Moreover, our method enables personalized online monitoring and prediction of multivariate time series with uncertainty quantification.
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Cao, Ruisheng, Lei, Fangyu, Wu, Haoyuan, Chen, Jixuan, Fu, Yeqiao, Gao, Hongcheng, Xiong, Xinzhuang, Zhang, Hanchong, Mao, Yuchen, Hu, Wenjing, Xie, Tianbao, Xu, Hongshen, Zhang, Danyang, Wang, Sida, Sun, Ruoxi, Yin, Pengcheng, Xiong, Caiming, Ni, Ansong, Liu, Qian, Zhong, Victor, Chen, Lu, Yu, Kai, Yu, Tao
Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivity of experts while democratizing access to large-scale data analysis. In this paper, we introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering workflows, featuring 494 real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks, derived from real-world use cases, evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems. To balance realistic simulation with evaluation simplicity, we devote significant effort to developing automatic configurations for task setup and carefully crafting evaluation metrics for each task. Furthermore, we supplement multimodal agents with comprehensive documents of these enterprise data software systems. Our empirical evaluation reveals that existing state-of-the-art LLM/VLM-based agents do not reliably automate full data workflows (14.0% success). Even with step-by-step guidance, these agents still underperform in tasks that require fine-grained, knowledge-intensive GUI actions (16.2%) and involve remote cloud-hosted workspaces (10.6%). We hope that Spider2-V paves the way for autonomous multimodal agents to transform the automation of data science and engineering workflow. Our code and data are available at https://spider2-v.github.io.
Data-Driven Abstractions via Binary-Tree Gaussian Processes for Formal Verification
Schรถn, Oliver, Naseer, Shammakh, Wooding, Ben, Soudjani, Sadegh
To advance formal verification of stochastic systems against temporal logic requirements for handling unknown dynamics, researchers have been designing data-driven approaches inspired by breakthroughs in the underlying machine learning techniques. As one promising research direction, abstraction-based solutions based on Gaussian process (GP) regression have become popular for their ability to learn a representation of the latent system from data with a quantified error. Results obtained based on this model are then translated to the true system via various methods. In a recent publication, GPs using a so-called binary-tree kernel have demonstrated a polynomial speedup w.r.t. the size of the data compared to their vanilla version, outcompeting all existing sparse GP approximations. Incidentally, the resulting binary-tree Gaussian process (BTGP) is characteristic for its piecewise-constant posterior mean and covariance functions, naturally abstracting the input space into discrete partitions. In this paper, we leverage this natural abstraction of the BTGP for formal verification, eliminating the need for cumbersome abstraction and error quantification procedures. We show that the BTGP allows us to construct an interval Markov chain model of the unknown system with a speedup that is polynomial w.r.t. the size of the abstraction compared to alternative approaches. We provide a delocalized error quantification via a unified formula even when the true dynamics do not live in the function space of the BTGP. This allows us to compute upper and lower bounds on the probability of satisfying reachability specifications that are robust to both aleatoric and epistemic uncertainties.
Deflated Dynamics Value Iteration
Lee, Jongmin, Rakhsha, Amin, Ryu, Ernest K., Farahmand, Amir-massoud
The Value Iteration (VI) algorithm is an iterative procedure to compute the value function of a Markov decision process, and is the basis of many reinforcement learning (RL) algorithms as well. As the error convergence rate of VI as a function of iteration $k$ is $O(\gamma^k)$, it is slow when the discount factor $\gamma$ is close to $1$. To accelerate the computation of the value function, we propose Deflated Dynamics Value Iteration (DDVI). DDVI uses matrix splitting and matrix deflation techniques to effectively remove (deflate) the top $s$ dominant eigen-structure of the transition matrix $\mathcal{P}^{\pi}$. We prove that this leads to a $\tilde{O}(\gamma^k |\lambda_{s+1}|^k)$ convergence rate, where $\lambda_{s+1}$is $(s+1)$-th largest eigenvalue of the dynamics matrix. We then extend DDVI to the RL setting and present Deflated Dynamics Temporal Difference (DDTD) algorithm. We empirically show the effectiveness of the proposed algorithms.
Walking the Values in Bayesian Inverse Reinforcement Learning
Bajgar, Ondrej, Abate, Alessandro, Gatsis, Konstantinos, Osborne, Michael A.
The goal of Bayesian inverse reinforcement learning (IRL) is recovering a posterior distribution over reward functions using a set of demonstrations from an expert optimizing for a reward unknown to the learner. The resulting posterior over rewards can then be used to synthesize an apprentice policy that performs well on the same or a similar task. A key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood, often defined in terms of Q values: vanilla Bayesian IRL needs to solve the costly forward planning problem - going from rewards to the Q values - at every step of the algorithm, which may need to be done thousands of times. We propose to solve this by a simple change: instead of focusing on primarily sampling in the space of rewards, we can focus on primarily working in the space of Q-values, since the computation required to go from Q-values to reward is radically cheaper. Furthermore, this reversion of the computation makes it easy to compute the gradient allowing efficient sampling using Hamiltonian Monte Carlo. We propose ValueWalk - a new Markov chain Monte Carlo method based on this insight - and illustrate its advantages on several tasks.