Goto

Collaborating Authors

 Reinforcement Learning


Optimal sequential decision making with probabilistic digital twins

arXiv.org Machine Learning

Digital twins are emerging in many industries, typically consisting of simulation models and data associated with a specific physical system. One of the main reasons for developing a digital twin, is to enable the simulation of possible consequences of a given action, without the need to interfere with the physical system itself. Physical systems of interest, and the environments they operate in, do not always behave deterministically. Moreover, information about the system and its environment is typically incomplete or imperfect. Probabilistic representations of systems and environments may therefore be called for, especially to support decisions in application areas where actions may have severe consequences. In this paper we introduce the probabilistic digital twin (PDT). We will start by discussing how epistemic uncertainty can be treated using measure theory, by modelling epistemic information via $\sigma$-algebras. Based on this, we give a formal definition of how epistemic uncertainty can be updated in a PDT. We then study the problem of optimal sequential decision making. That is, we consider the case where the outcome of each decision may inform the next. Within the PDT framework, we formulate this optimization problem. We discuss how this problem may be solved (at least in theory) via the maximum principle method or the dynamic programming principle. However, due to the curse of dimensionality, these methods are often not tractable in practice. To mend this, we propose a generic approximate solution using deep reinforcement learning together with neural networks defined on sets. We illustrate the method on a practical problem, considering optimal information gathering for the estimation of a failure probability.


Ali Asmari, PhD, Head of AI and Machine Learning at ULC Technologies – Interview Series

#artificialintelligence

Reinforcement Learning, which is a reward-based system to train complicated devices without complicated reverse kinematic calculations.


Going Deep on Deep Learning

#artificialintelligence

In a field with a tendency to jump from buzzword to buzzword, deep learning has proven to be a consistent driver of discovery and scientific progress -- but also a frequent source of confusion, and even trepidation. Well, TDS is here to help. A good place to start? This comprehensive introduction to Bayesian deep learning by Joris Baan, which he wrote with the explicit goal of bridging the gap between basic probability theory and cutting-edge research. You'd be hard-pressed to find a better article to read next than Lina Faik's patient explanation of deep Q learning, reinforcement learning, and how the two are powering new and exciting real-world applications.


Maximum Entropy RL (Provably) Solves Some Robust RL Problems

#artificialintelligence

Nearly all real-world applications of reinforcement learning involve some degree of shift between the training environment and the testing environment. However, prior work has observed that even small shifts in the environment cause most RL algorithms to perform markedly worse. As we aim to scale reinforcement learning algorithms and apply them in the real world, it is increasingly important to learn policies that are robust to changes in the environment. Broadly, prior approaches to handling distribution shift in RL aim to maximize performance in either the average case or the worst case. While these methods have been successfully applied to a number of areas (e.g., self-driving cars, robot locomotion and manipulation), their success rests critically on the design of the distribution of environments.


Best and No.1 Introduction to Reinforcement Learning! - WriteX.today

#artificialintelligence

Let's see some simple example which helps you to illustrate the reinforcement learning mechanism. Consider the scenario of teaching new tricks to your cat. There are three approaches to implement a Reinforcement Learning algorithm. In a value-based Reinforcement Learning method, you should try to maximize a value function V(s). In this method, the agent is expecting a long-term return of the current states under policy π.


Large Batch Simulation for Deep Reinforcement Learning

arXiv.org Artificial Intelligence

We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19,000 frames of experience per second on a single GPU and up to 72,000 frames per second on a single eight-GPU machine. The key idea of our approach is to design a 3D renderer and embodied navigation simulator around the principle of "batch simulation": accepting and executing large batches of requests simultaneously. Beyond exposing large amounts of work at once, batch simulation allows implementations to amortize in-memory storage of scene assets, rendering work, data loading, and synchronization costs across many simulation requests, dramatically improving the number of simulated agents per GPU and overall simulation throughput. To balance DNN inference and training costs with faster simulation, we also build a computationally efficient policy DNN that maintains high task performance, and modify training algorithms to maintain sample efficiency when training with large mini-batches. By combining batch simulation and DNN performance optimizations, we demonstrate that PointGoal navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system using a 64-GPU cluster over three days. We provide open-source reference implementations of our batch 3D renderer and simulator to facilitate incorporation of these ideas into RL systems. It is now common for modern reinforcement learning (RL) algorithms leveraging deep neural networks (DNNs) to require billions of samples of experience from simulated environments (Wijmans et al., 2020; Petrenko et al., 2020; OpenAI et al., 2019; Silver et al., 2017; Vinyals et al., 2019). For embodied AI tasks such as visual navigation, where the ultimate goal for learned policies is deployment in the real world, learning from realistic simulations is important for successful transfer of learned policies to physical robots. In these cases simulators must render detailed 3D scenes and simulate agent interaction with complex environments (Kolve et al., 2017; Dosovitskiy et al., 2017; Savva et al., 2019; Xia et al., 2020; Gan et al., 2020). Evaluating and training a DNN on billions of simulated samples is computationally expensive.


A Reinforcement Learning Based Approach to Play Calling in Football

arXiv.org Machine Learning

With the advances in computer power and the ability to both acquire and store huge quantities of data, so goes the corresponding advance of the machine (aka algorithm) to replace the human as a primary source of decision making. The number of successful applications is increasing at a rapid pace; in games, such as Chess and Go, medical imaging and diagnosing tumours, to automated driving, and even the selection of candidates for jobs. The notion of reinforcement learning is one key principle, whereby a game or set of decisions is studied and rewards recorded so a machine can learn long term benefits from local decisions, often negotiating a sequence of complex decisions. For example, Silver et al. (2017) discuss how a machine can become an expert at the game Go simply by playing against itself, with Bai and Jin (2020) looking at more general self-play algorithms.


On Finite-Sample Analysis of Offline Reinforcement Learning with Deep ReLU Networks

arXiv.org Machine Learning

This paper studies the statistical theory of offline reinforcement learning with deep ReLU networks. We consider the off-policy evaluation (OPE) problem where the goal is to estimate the expected discounted reward of a target policy given the logged data generated by unknown behaviour policies. We study a regression-based fitted Q evaluation (FQE) method using deep ReLU networks and characterize a finite-sample bound on the estimation error of this method under mild assumptions. The prior works in OPE with either general function approximation or deep ReLU networks ignore the data-dependent structure in the algorithm, dodging the technical bottleneck of OPE, while requiring a rather restricted regularity assumption. In this work, we overcome these limitations and provide a comprehensive analysis of OPE with deep ReLU networks. In particular, we precisely quantify how the distribution shift of the offline data, the dimension of the input space, and the regularity of the system control the OPE estimation error. Consequently, we provide insights into the interplay between offline reinforcement learning and deep learning.


Discovering Diverse Solutions in Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Reinforcement learning (RL) algorithms are typically limited to learning a single solution of a specified task, even though there often exists diverse solutions to a given task. Compared with learning a single solution, learning a set of diverse solutions is beneficial because diverse solutions enable robust few-shot adaptation and allow the user to select a preferred solution. Although previous studies have showed that diverse behaviors can be modeled with a policy conditioned on latent variables, an approach for modeling an infinite set of diverse solutions with continuous latent variables has not been investigated. In this study, we propose an RL method that can learn infinitely many solutions by training a policy conditioned on a continuous or discrete low-dimensional latent variable. Through continuous control tasks, we demonstrate that our method can learn diverse solutions in a data-efficient manner and that the solutions can be used for few-shot adaptation to solve unseen tasks.


Adversarial attacks in consensus-based multi-agent reinforcement learning

arXiv.org Artificial Intelligence

Recently, many cooperative distributed multi-agent reinforcement learning (MARL) algorithms have been proposed in the literature. In this work, we study the effect of adversarial attacks on a network that employs a consensus-based MARL algorithm. We show that an adversarial agent can persuade all the other agents in the network to implement policies that optimize an objective that it desires. In this sense, the standard consensus-based MARL algorithms are fragile to attacks.