Goto

Collaborating Authors

 Markov Models


Masked World Models for Visual Control

arXiv.org Artificial Intelligence

Visual model-based reinforcement learning (RL) has the potential to enable sample-efficient robot learning from visual observations. Yet the current approaches typically train a single model end-to-end for learning both visual representations and dynamics, making it difficult to accurately model the interaction between robots and small objects. In this work, we introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning. Specifically, we train an autoencoder with convolutional layers and vision transformers (ViT) to reconstruct pixels given masked convolutional features, and learn a latent dynamics model that operates on the representations from the autoencoder. Moreover, to encode task-relevant information, we introduce an auxiliary reward prediction objective for the autoencoder. We continually update both autoencoder and dynamics model using online samples collected from environment interaction. We demonstrate that our decoupling approach achieves state-of-the-art performance on a variety of visual robotic tasks from Meta-world and RLBench, e.g., we achieve 81.7% success rate on 50 visual robotic manipulation tasks from Meta-world, while the baseline achieves 67.9%. Code is available on the project website: https://sites.google.com/view/mwm-rl.


Adaptive and Collaborative Bathymetric Channel-Finding Approach for Multiple Autonomous Marine Vehicles

arXiv.org Artificial Intelligence

This paper reports an investigation into the problem of rapid identification of a channel that crosses a body of water using one or more Unmanned Surface Vehicles (USV). A new algorithm called Proposal Based Adaptive Channel Search (PBACS) is presented as a potential solution that improves upon current methods. The empirical performance of PBACS is compared to lawnmower surveying and to Markov decision process (MDP) planning with two state-of-the-art reward functions: Upper Confidence Bound (UCB) and Maximum Value Information (MVI). The performance of each method is evaluated through comparison of the time it takes to identify a continuous channel through an area, using one, two, three, or four USVs. The performance of each method is compared across ten simulated bathymetry scenarios and one field area, each with different channel layouts. The results from simulations and field trials indicate that on average multi-vehicle PBACS outperforms lawnmower, UCB, and MVI based methods, especially when at least three vehicles are used.


MULTIGAIN 2.0: MDP controller synthesis for multiple mean-payoff, LTL and steady-state constraints

arXiv.org Artificial Intelligence

We present MULTIGAIN 2.0, a major extension to the controller synthesis tool MultiGain, built on top of the probabilistic model checker PRISM. This new version extends MultiGain's multi-objective capabilities, by allowing for the formal verification and synthesis of controllers for probabilistic systems with multi-dimensional long-run average reward structures, steady-state constraints, and linear temporal logic properties. Additionally, MULTIGAIN 2.0 provides an approach for finding finite memory solutions and the capability for two- and three-dimensional visualization of Pareto curves to facilitate trade-off analysis in multi-objective scenarios


Universal Approximation and the Topological Neural Network

arXiv.org Artificial Intelligence

A topological neural network (TNN), which takes data from a Tychonoff topological space instead of the usual finite dimensional space, is introduced. As a consequence, a distributional neural network (DNN) that takes Borel measures as data is also introduced. Combined these new neural networks facilitate things like recognizing long range dependence, heavy tails and other properties in stochastic process paths or like acting on belief states produced by particle filtering or hidden Markov model algorithms. The veracity of the TNN and DNN are then established herein by a strong universal approximation theorem for Tychonoff spaces and its corollary for spaces of measures. These theorems show that neural networks can arbitrarily approximate uniformly continuous functions (with respect to the sup metric) associated with a unique uniformity. We also provide some discussion showing that neural networks on positive-finite measures are a generalization of the recent deep learning notion of deep sets.


Is Centralized Training with Decentralized Execution Framework Centralized Enough for MARL?

arXiv.org Artificial Intelligence

Centralized Training with Decentralized Execution (CTDE) has recently emerged as a popular framework for cooperative Multi-Agent Reinforcement Learning (MARL), where agents can use additional global state information to guide training in a centralized way and make their own decisions only based on decentralized local policies. Despite the encouraging results achieved, CTDE makes an independence assumption on agent policies, which limits agents from adopting global cooperative information from each other during centralized training. Therefore, we argue that the existing CTDE framework cannot fully utilize global information for training, leading to an inefficient joint-policy exploration and even suboptimal results. In this paper, we introduce a novel Centralized Advising and Decentralized Pruning (CADP) framework for multi-agent reinforcement learning, that not only enables an efficacious message exchange among agents during training but also guarantees the independent policies for execution. Firstly, CADP endows agents the explicit communication channel to seek and take advice from different agents for more centralized training. To further ensure the decentralized execution, we propose a smooth model pruning mechanism to progressively constrain the agent communication into a closed one without degradation in agent cooperation capability. Empirical evaluations on StarCraft II micromanagement challenge and Google Research Football benchmarks and and across different MARL backbones demonstrate that the proposed framework achieves superior performance compared with the state-of-the-art counterparts. Our code is available at https://github.com/zyh1999/CADP.


Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes

arXiv.org Artificial Intelligence

Autonomous systems often have logical constraints arising, for example, from safety, operational, or regulatory requirements. Such constraints can be expressed using temporal logic specifications. The system state is often partially observable. Moreover, it could encompass a team of multiple agents with a common objective but disparate information structures and constraints. In this paper, we first introduce an optimal control theory for partially observable Markov decision processes (POMDPs) with finite linear temporal logic constraints. We provide a structured methodology for synthesizing policies that maximize a cumulative reward while ensuring that the probability of satisfying a temporal logic constraint is sufficiently high. Our approach comes with guarantees on approximate reward optimality and constraint satisfaction. We then build on this approach to design an optimal control framework for logically constrained multi-agent settings with information asymmetry. We illustrate the effectiveness of our approach by implementing it on several case studies.


A Simulation Environment and Reinforcement Learning Method for Waste Reduction

arXiv.org Artificial Intelligence

In retail (e.g., grocery stores, apparel shops, online retailers), inventory managers have to balance short-term risk (no items to sell) with long-term-risk (over ordering leading to product waste). This balancing task is made especially hard due to the lack of information about future customer purchases. In this paper, we study the problem of restocking a grocery store's inventory with perishable items over time, from a distributional point of view. The objective is to maximize sales while minimizing waste, with uncertainty about the actual consumption by costumers. This problem is of a high relevance today, given the growing demand for food and the impact of food waste on the environment, the economy, and purchasing power. We frame inventory restocking as a new reinforcement learning task that exhibits stochastic behavior conditioned on the agent's actions, making the environment partially observable. We make two main contributions. First, we introduce a new reinforcement learning environment, RetaiL, based on real grocery store data and expert knowledge. This environment is highly stochastic, and presents a unique challenge for reinforcement learning practitioners. We show that uncertainty about the future behavior of the environment is not handled well by classical supply chain algorithms, and that distributional approaches are a good way to account for the uncertainty. Second, we introduce GTDQN, a distributional reinforcement learning algorithm that learns a generalized Tukey Lambda distribution over the reward space. GTDQN provides a strong baseline for our environment. It outperforms other distributional reinforcement learning approaches in this partially observable setting, in both overall reward and reduction of generated waste.


Explainable Activity Recognition for Smart Home Systems

arXiv.org Artificial Intelligence

Smart home environments are designed to provide services that help improve the quality of life for the occupant via a variety of sensors and actuators installed throughout the space. Many automated actions taken by a smart home are governed by the output of an underlying activity recognition system. However, activity recognition systems may not be perfectly accurate and therefore inconsistencies in smart home operations can lead users reliant on smart home predictions to wonder "why did the smart home do that?" In this work, we build on insights from Explainable Artificial Intelligence (XAI) techniques and introduce an explainable activity recognition framework in which we leverage leading XAI methods to generate natural language explanations that explain what about an activity led to the given classification. Within the context of remote caregiver monitoring, we perform a two-step evaluation: (a) utilize ML experts to assess the sensibility of explanations, and (b) recruit non-experts in two user remote caregiver monitoring scenarios, synchronous and asynchronous, to assess the effectiveness of explanations generated via our framework. Our results show that the XAI approach, SHAP, has a 92% success rate in generating sensible explanations. Moreover, in 83% of sampled scenarios users preferred natural language explanations over a simple activity label, underscoring the need for explainable activity recognition systems. Finally, we show that explanations generated by some XAI methods can lead users to lose confidence in the accuracy of the underlying activity recognition model. We make a recommendation regarding which existing XAI method leads to the best performance in the domain of smart home automation, and discuss a range of topics for future work to further improve explainable activity recognition.


Bayesian Reinforcement Learning for Automatic Voltage Control under Cyber-Induced Uncertainty

arXiv.org Artificial Intelligence

Voltage control is crucial to large-scale power system reliable operation, as timely reactive power support can help prevent widespread outages. However, there is currently no built in mechanism for power systems to ensure that the voltage control objective to maintain reliable operation will survive or sustain the uncertainty caused under adversary presence. Hence, this work introduces a Bayesian Reinforcement Learning (BRL) approach for power system control problems, with focus on sustained voltage control under uncertainty in a cyber-adversarial environment. This work proposes a data-driven BRL-based approach for automatic voltage control by formulating and solving a Partially-Observable Markov Decision Problem (POMDP), where the states are partially observable due to cyber intrusions. The techniques are evaluated on the WSCC and IEEE 14 bus systems. Additionally, BRL techniques assist in automatically finding a threshold for exploration and exploitation in various RL techniques.


NormMark: A Weakly Supervised Markov Model for Socio-cultural Norm Discovery

arXiv.org Artificial Intelligence

Norms, which are culturally accepted guidelines for behaviours, can be integrated into conversational models to generate utterances that are appropriate for the socio-cultural context. Existing methods for norm recognition tend to focus only on surface-level features of dialogues and do not take into account the interactions within a conversation. To address this issue, we propose NormMark, a probabilistic generative Markov model to carry the latent features throughout a dialogue. These features are captured by discrete and continuous latent variables conditioned on the conversation history, and improve the model's ability in norm recognition. The model is trainable on weakly annotated data using the variational technique. On a dataset with limited norm annotations, we show that our approach achieves higher F1 score, outperforming current state-of-the-art methods, including GPT3.