Markov Models
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
Drouin, Alexandre, Gasse, Maxime, Caccia, Massimo, Laradji, Issam H., Del Verme, Manuel, Marty, Tom, Boisvert, Léo, Thakkar, Megh, Cappart, Quentin, Vazquez, David, Chapados, Nicolas, Lacoste, Alexandre
We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuring the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 33 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.
Mix Q-learning for Lane Changing: A Collaborative Decision-Making Method in Multi-Agent Deep Reinforcement Learning
Bi, Xiaojun, He, Mingjie, Sun, Yiwen
Lane-changing decisions, which are crucial for autonomous vehicle path planning, face practical challenges due to rule-based constraints and limited data. Deep reinforcement learning has become a major research focus due to its advantages in data acquisition and interpretability. However, current models often overlook collaboration, which affects not only impacts overall traffic efficiency but also hinders the vehicle's own normal driving in the long run. To address the aforementioned issue, this paper proposes a method named Mix Q-learning for Lane Changing(MQLC) that integrates a hybrid value Q network, taking into account both collective and individual benefits for the greater good. At the collective level, our method coordinates the individual Q and global Q networks by utilizing global information. This enables agents to effectively balance their individual interests with the collective benefit. At the individual level, we integrated a deep learning-based intent recognition module into our observation and enhanced the decision network. These changes provide agents with richer decision information and more accurate feature extraction for improved lane-changing decisions. This strategy enables the multi-agent system to learn and formulate optimal decision-making strategies effectively. Our MQLC model, through extensive experimental results, impressively outperforms other state-of-the-art multi-agent decision-making methods, achieving significantly safer and faster lane-changing decisions.
The data augmentation algorithm
Roy, Vivekananda, Khare, Kshitij, Hobert, James P.
The data augmentation (DA) algorithms are popular Markov chain Monte Carlo (MCMC) algorithms often used for sampling from intractable probability distributions. This review article comprehensively surveys DA MCMC algorithms, highlighting their theoretical foundations, methodological implementations, and diverse applications in frequentist and Bayesian statistics. The article discusses tools for studying the convergence properties of DA algorithms. Furthermore, it contains various strategies for accelerating the speed of convergence of the DA algorithms, different extensions of DA algorithms and outlines promising directions for future research. This paper aims to serve as a resource for researchers and practitioners seeking to leverage data augmentation techniques in MCMC algorithms by providing key insights and synthesizing recent developments.
Active Inference Meeting Energy-Efficient Control of Parallel and Identical Machines
Yeganeh, Yavar Taheri, Jafari, Mohsen, Matta, Andrea
We investigate the application of active inference in developing energy-efficient control agents for manufacturing systems. Active inference, rooted in neuroscience, provides a unified probabilistic framework integrating perception, learning, and action, with inherent uncertainty quantification elements. Our study explores deep active inference, an emerging field that combines deep learning with the active inference decision-making framework. Leveraging a deep active inference agent, we focus on controlling parallel and identical machine workstations to enhance energy efficiency. We address challenges posed by the problem's stochastic nature and delayed policy response by introducing tailored enhancements to existing agent architectures. Specifically, we introduce multi-step transition and hybrid horizon methods to mitigate the need for complex planning. Our experimental results demonstrate the effectiveness of these enhancements and highlight the potential of the active inference-based approach.
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
Gan, Qijun, Wang, Song, Wu, Shengtao, Zhu, Jianke
Recently, artificial intelligence techniques for education have been received increasing attentions, while it still remains an open problem to design the effective music instrument instructing systems. Although key presses can be directly derived from sheet music, the transitional movements among key presses require more extensive guidance in piano performance. In this work, we construct a piano-hand motion generation benchmark to guide hand movements and fingerings for piano playing. To this end, we collect an annotated dataset, PianoMotion10M, consisting of 116 hours of piano playing videos from a bird's-eye view with 10 million annotated hand poses. We also introduce a powerful baseline model that generates hand motions from piano audios through a position predictor and a position-guided gesture generator. Furthermore, a series of evaluation metrics are designed to assess the performance of the baseline model, including motion similarity, smoothness, positional accuracy of left and right hands, and overall fidelity of movement distribution. Despite that piano key presses with respect to music scores or audios are already accessible, PianoMotion10M aims to provide guidance on piano fingering for instruction purposes.
Learning the Influence Graph of a High-Dimensional Markov Process with Memory
Bagewadi, Smita, Chatterjee, Avhishek
Motivated by multiple applications in social networks, nervous systems, and financial risk analysis, we consider the problem of learning the underlying (directed) influence graph or causal graph of a high-dimensional multivariate discrete-time Markov process with memory. At any discrete time instant, each observed variable of the multivariate process is a binary string of random length, which is parameterized by an unobservable or hidden [0,1]-valued scalar. The hidden scalars corresponding to the variables evolve according to discrete-time linear stochastic dynamics dictated by the underlying influence graph whose nodes are the variables. We extend an existing algorithm for learning i.i.d. graphical models to this Markovian setting with memory and prove that it can learn the influence graph based on the binary observations using logarithmic (in number of variables or nodes) samples when the degree of the influence graph is bounded. The crucial analytical contribution of this work is the derivation of the sample complexity result by upper and lower bounding the rate of convergence of the observed Markov process with memory to its stationary distribution in terms of the parameters of the influence graph.
ALPHAGMUT: A Rationale-Guided Alpha Shape Graph Neural Network to Evaluate Mutation Effects
Wang, Boshen, Ye, Bowei, Xu, Lin, Liang, Jie
In silico methods evaluating the mutation effects of missense mutations are providing an important approach for understanding mutations in personal genomes and identifying disease-relevant biomarkers. However, existing methods, including deep learning methods, heavily rely on sequence-aware information, and do not fully leverage the potential of available 3D structural information. In addition, these methods may exhibit an inability to predict mutations in domains difficult to formulate sequence-based embeddings. In this study, we introduce a novel rationale-guided graph neural network AlphaGMut to evaluate mutation effects and to distinguish pathogenic mutations from neutral mutations. We compute the alpha shapes of protein structures to obtain atomic-resolution edge connectivities and map them to an accurate residue-level graph representation. We then compute structural-, topological-, biophysical-, and sequence properties of the mutation sites, which are assigned as node attributes in the graph. These node attributes could effectively guide the graph neural network to learn the difference between pathogenic and neutral mutations using k-hop message passing with a short training period. We demonstrate that AlphaGMut outperforms state-of-the-art methods, including DeepMind's AlphaMissense, in many performance metrics. In addition, AlphaGMut has the advantage of performing well in alignment-free settings, which provides broader prediction coverage and better generalization compared to current methods requiring deep sequence-aware information.
Surprise! Using Physiological Stress for Allostatic Regulation Under the Active Inference Framework [Pre-Print]
Note: This manuscript has been accepted for publication at a conference in 2024 and will be published under the same title. The version in this pre-print will undergo minor edits and thus does not represent the final version of this work. Abstract-- Allostasis proposes that long-term viability of a living system is achieved through anticipatory adjustments of its physiology and behaviour: emphasising physiological and affective stress as an adaptive state of adaptation that minimizes long-term prediction errors. More recently, the active inference framework (AIF) has also sought to explain action and long-term adaptation through the minimization of future errors (free energy), through the learning of statistical contingencies of the world, offering a formalism for allostatic regulation. We suggest that framing prediction errors through the lens of biological hormonal dynamics proposed by allostasis offers a way to integrate these two models together in a biologically-plausible manner. In this paper, we describe our initial work in developing a model that grounds prediction errors (surprisal) into the secretion of a physiological stress hormone (cortisol) acting as an adaptive, allostatic mediator on a homeostatically-controlled physiology. We evaluate this using a computational model in simulations using an active inference agent endowed with an artificial physiology, regulated through homeostatic and allostatic control in a stochastic environment. Our results find that allostatic functions of cortisol (stress), secreted as a function of prediction errors, provide adaptive advantages to the agent's longterm physiological regulation. We argue that the coupling of information-theoretic prediction errors to low-level, biological hormonal dynamics of stress can provide a computationally efficient model to long-term regulation for embodied intelligent systems. A. Background In both biological and artificial systems, mechanisms of adaptation are critical to long-term stability and viability in dynamic, unpredictable environments.
Grounding Multimodal Large Language Models in Actions
Szot, Andrew, Mazoure, Bogdan, Agrawal, Harsh, Hjelm, Devon, Kira, Zsolt, Toshev, Alexander
Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains, including Embodied AI. In this work, we study how to best ground a MLLM into different embodiments and their associated action spaces, with the goal of leveraging the multimodal world knowledge of the MLLM. We first generalize a number of methods through a unified architecture and the lens of action space adaptors. For continuous actions, we show that a learned tokenization allows for sufficient modeling precision, yielding the best performance on downstream tasks. For discrete actions, we demonstrate that semantically aligning these actions with the native output token space of the MLLM leads to the strongest performance. We arrive at these lessons via a thorough study of seven action space adapters on five different environments, encompassing over 114 embodied tasks.
A Federated Online Restless Bandit Framework for Cooperative Resource Allocation
Tong, Jingwen, Li, Xinran, Fu, Liqun, Zhang, Jun, Letaief, Khaled B.
Restless multi-armed bandits (RMABs) have been widely utilized to address resource allocation problems with Markov reward processes (MRPs). Existing works often assume that the dynamics of MRPs are known prior, which makes the RMAB problem solvable from an optimization perspective. Nevertheless, an efficient learning-based solution for RMABs with unknown system dynamics remains an open problem. In this paper, we study the cooperative resource allocation problem with unknown system dynamics of MRPs. This problem can be modeled as a multi-agent online RMAB problem, where multiple agents collaboratively learn the system dynamics while maximizing their accumulated rewards. We devise a federated online RMAB framework to mitigate the communication overhead and data privacy issue by adopting the federated learning paradigm. Based on this framework, we put forth a Federated Thompson Sampling-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem. The FedTSWI algorithm enjoys a high communication and computation efficiency, and a privacy guarantee. Moreover, we derive a regret upper bound for the FedTSWI algorithm. Finally, we demonstrate the effectiveness of the proposed algorithm on the case of online multi-user multi-channel access. Numerical results show that the proposed algorithm achieves a fast convergence rate of $\mathcal{O}(\sqrt{T\log(T)})$ and better performance compared with baselines. More importantly, its sample complexity decreases with the number of agents.