Markov Models
GKNet: Graph Kalman Filtering and Model Inference via Model-based Deep Learning
Sabbaqi, Mohammad, Taormina, Riccardo, Isufi, Elvin
Inference tasks with time series over graphs are of importance in applications such as urban water networks, economics, and networked neuroscience. Addressing these tasks typically relies on identifying a computationally affordable model that jointly captures the graph-temporal patterns of the data. In this work, we propose a graph-aware state space model for graph time series, where both the latent state and the observation equation are parametric graph-induced models with a limited number of parameters that need to be learned. More specifically, we consider the state equation to follow a stochastic partial differential equation driven by noise over the graphs edges accounting not only for potential edge uncertainties but also for increasing the degrees of freedom in the latter in a tractable manner. The graph structure conditioning of the noise dispersion allows the state variable to deviate from the stochastic process in certain neighborhoods. The observation model is a sampled and graph-filtered version of the state capturing multi-hop neighboring influence. The goal is to learn the parameters in both state and observation models from the partially observed data for downstream tasks such as prediction and imputation. The model is inferred first through a maximum likelihood approach that provides theoretical tractability but is limited in expressivity and scalability. To improve on the latter, we use the state-space formulation to build a principled deep learning architecture that jointly learns the parameters and tracks the state in an end-to-end manner in the spirit of Kalman neural networks.
Advancements and Challenges in Continual Reinforcement Learning: A Comprehensive Review
Zuffer, Amara, Burke, Michael, Harandi, Mehrtash
The diversity of tasks and dynamic nature of reinforcement learning (RL) require RL agents to be able to learn sequentially and continuously, a learning paradigm known as continuous reinforcement learning. This survey reviews how continual learning transforms RL agents into dynamic continual learners. This enables RL agents to acquire and retain useful and reusable knowledge seamlessly. The paper delves into fundamental aspects of continual reinforcement learning, exploring key concepts, significant challenges, and novel methodologies. Special emphasis is placed on recent advancements in continual reinforcement learning within robotics, along with a succinct overview of evaluation environments utilized in prominent research, facilitating accessibility for newcomers to the field. The review concludes with a discussion on limitations and promising future directions, providing valuable insights for researchers and practitioners alike.
Multi-agent Markov Entanglement
Value decomposition has long been a fundamental technique in multi-agent dynamic programming and reinforcement learning (RL). Specifically, the value function of a global state $(s_1,s_2,\ldots,s_N)$ is often approximated as the sum of local functions: $V(s_1,s_2,\ldots,s_N)\approx\sum_{i=1}^N V_i(s_i)$. This approach traces back to the index policy in restless multi-armed bandit problems and has found various applications in modern RL systems. However, the theoretical justification for why this decomposition works so effectively remains underexplored. In this paper, we uncover the underlying mathematical structure that enables value decomposition. We demonstrate that a multi-agent Markov decision process (MDP) permits value decomposition if and only if its transition matrix is not "entangled" -- a concept analogous to quantum entanglement in quantum physics. Drawing inspiration from how physicists measure quantum entanglement, we introduce how to measure the "Markov entanglement" for multi-agent MDPs and show that this measure can be used to bound the decomposition error in general multi-agent MDPs. Using the concept of Markov entanglement, we proved that a widely-used class of index policies is weakly entangled and enjoys a sublinear $\mathcal O(\sqrt{N})$ scale of decomposition error for $N$-agent systems. Finally, we show how Markov entanglement can be efficiently estimated in practice, providing practitioners with an empirical proxy for the quality of value decomposition.
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Sun, Qiushi, Liu, Zhoumianze, Ma, Chang, Ding, Zichen, Xu, Fangzhi, Yin, Zhangyue, Zhao, Haiteng, Wu, Zhenyu, Cheng, Kanzhi, Liu, Zhaoyang, Wang, Jianing, Li, Qintong, Tang, Xiangru, Xie, Tianbao, Feng, Xiachong, Li, Xiang, Kao, Ben, Wang, Wenhai, Qi, Biqing, Kong, Lingpeng, Wu, Zhiyong
Large Language Models (LLMs) have extended their impact beyond Natural Language Processing, substantially fostering the development of interdisciplinary research. Recently, various LLM-based agents have been developed to assist scientific discovery progress across multiple aspects and domains. Among these, computer-using agents, capable of interacting with operating systems as humans do, are paving the way to automated scientific problem-solving and addressing routines in researchers' workflows. Recognizing the transformative potential of these agents, we introduce ScienceBoard, which encompasses two complementary contributions: (i) a realistic, multi-domain environment featuring dynamic and visually rich scientific workflows with integrated professional software, where agents can autonomously interact via different interfaces to accelerate complex research tasks and experiments; and (ii) a challenging benchmark of 169 high-quality, rigorously validated real-world tasks curated by humans, spanning scientific-discovery workflows in domains such as biochemistry, astronomy, and geoinformatics. Extensive evaluations of agents with state-of-the-art backbones (e.g., GPT-4o, Claude 3.7, UI-TARS) show that, despite some promising results, they still fall short of reliably assisting scientists in complex workflows, achieving only a 15% overall success rate. In-depth analysis further provides valuable insights for addressing current agent limitations and more effective design principles, paving the way to build more capable agents for scientific discovery. Our code, environment, and benchmark are at https://qiushisun.github.io/ScienceBoard-Home/.
Beyond Reactive Safety: Risk-Aware LLM Alignment via Long-Horizon Simulation
Sun, Chenkai, Zhang, Denghui, Zhai, ChengXiang, Ji, Heng
Given the growing influence of language model-based agents on high-stakes societal decisions, from public policy to healthcare, ensuring their beneficial impact requires understanding the far-reaching implications of their suggestions. We propose a proof-of-concept framework that projects how model-generated advice could propagate through societal systems on a macroscopic scale over time, enabling more robust alignment. To assess the long-term safety awareness of language models, we also introduce a dataset of 100 indirect harm scenarios, testing models' ability to foresee adverse, non-obvious outcomes from seemingly harmless user prompts. Our approach achieves not only over 20% improvement on the new dataset but also an average win rate exceeding 70% against strong baselines on existing safety benchmarks (AdvBench, SafeRLHF, WildGuardMix), suggesting a promising direction for safer agents.
Faster Fixed-Point Methods for Multichain MDPs
We study value-iteration (VI) algorithms for solving general (a.k.a. multichain) Markov decision processes (MDPs) under the average-reward criterion, a fundamental but theoretically challenging setting. Beyond the difficulties inherent to all average-reward problems posed by the lack of contractivity and non-uniqueness of solutions to the Bellman operator, in the multichain setting an optimal policy must solve the navigation subproblem of steering towards the best connected component, in addition to optimizing long-run performance within each component. We develop algorithms which better solve this navigational subproblem in order to achieve faster convergence for multichain MDPs, obtaining improved rates of convergence and sharper measures of complexity relative to prior work. Many key components of our results are of potential independent interest, including novel connections between average-reward and discounted problems, optimal fixed-point methods for discounted VI which extend to general Banach spaces, new sublinear convergence rates for the discounted value error, and refined suboptimality decompositions for multichain MDPs. Overall our results yield faster convergence rates for discounted and average-reward problems and expand the theoretical foundations of VI approaches.
Gaussian Invariant Markov Chain Monte Carlo
Titsias, Michalis K., Alexopoulos, Angelos, Liu, Siran, Dellaportas, Petros
We develop sampling methods, which consist of Gaussian invariant versions of random walk Metropolis (RWM), Metropolis adjusted Langevin algorithm (MALA) and second order Hessian or Manifold MALA. Unlike standard RWM and MALA we show that Gaussian invariant sampling can lead to ergodic estimators with improved statistical efficiency. This is due to a remarkable property of Gaussian invariance that allows us to obtain exact analytical solutions to the Poisson equation for Gaussian targets. These solutions can be used to construct efficient and easy to use control variates for variance reduction of estimators under any intractable target. We demonstrate the new samplers and estimators in several examples, including high dimensional targets in latent Gaussian models where we compare against several advanced methods and obtain state-of-the-art results. We also provide theoretical results regarding geometric ergodicity, and an optimal scaling analysis that shows the dependence of the optimal acceptance rate on the Gaussianity of the target.
Action-Minimization Meets Generative Modeling: Efficient Transition Path Sampling with the Onsager-Machlup Functional
Raja, Sanjeev, ล รญpka, Martin, Psenka, Michael, Kreiman, Tobias, Pavelka, Michal, Krishnapriyan, Aditi S.
Transition path sampling (TPS), which involves finding probable paths connecting two points on an energy landscape, remains a challenge due to the complexity of real-world atomistic systems. Current machine learning approaches use expensive, task-specific, and data-free training procedures, limiting their ability to benefit from high-quality datasets and large-scale pre-trained models. In this work, we address TPS by interpreting candidate paths as trajectories sampled from stochastic dynamics induced by the learned score function of pre-trained generative models, specifically denoising diffusion and flow matching. Under these dynamics, finding high-likelihood transition paths becomes equivalent to minimizing the Onsager-Machlup (OM) action functional. This enables us to repurpose pre-trained generative models for TPS in a zero-shot manner, in contrast with bespoke, task-specific approaches in previous work. We demonstrate our approach on varied molecular systems, obtaining diverse, physically realistic transition pathways and generalizing beyond the pre-trained model's original training dataset. Our method can be easily incorporated into new generative models, making it practically relevant as models continue to scale and improve with increased data availability. Code is available at github.com/ASK-Berkeley/OM-TPS.
Multi-convex Programming for Discrete Latent Factor Models Prototyping
Zhu, Hao, Yan, Shengchao, Hoffmann, Jasper, Boedecker, Joschka
Discrete latent factor models (DLFMs) are widely used in various domains such as machine learning, economics, neuroscience, psychology, etc. Currently, fitting a DLFM to some dataset relies on a customized solver for individual models, which requires lots of effort to implement and is limited to the targeted specific instance of DLFMs. In this paper, we propose a generic framework based on CVXPY, which allows users to specify and solve the fitting problem of a wide range of DLFMs, including both regression and classification models, within a very short script. Our framework is flexible and inherently supports the integration of regularization terms and constraints on the DLFM parameters and latent factors, such that the users can easily prototype the DLFM structure according to their dataset and application scenario. We introduce our open-source Python implementation and illustrate the framework in several examples.
Learning-Based Resource Management in Integrated Sensing and Communication Systems
Lu, Ziyang, Gursoy, M. Cenk, Mohan, Chilukuri K., Varshney, Pramod K.
-- In this paper, we tackle the task of adaptive time allocation in integrated sensing and communication systems equipped with radar and communication units. The dual-functional radar-communication system's task involves allocating dwell times for tracking multiple targets and utilizing the remaining time for data transmission towards estimated target locations. We introduce a novel constrained deep reinforcement learning (CDRL) approach, designed to optimize resource allocation between tracking and communication under time budget constraints, thereby enhancing target communication quality. Our numerical results demonstrate the efficiency of our proposed CDRL framework, confirming its ability to maximize communication quality in highly dynamic environments while adhering to time constraints. A. Background 1) Cognitive Radar: Radar technology, integral to various applications in environmental sensing, space exploration, navigation, and traffic control, has become increasingly important with the emergence of autonomous vehicles and drones.