AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Optimal sequential decision making with probabilistic digital twins

Agrell, Christian, Dahl, Kristina Rognlien, Hafver, Andreas

arXiv.org Machine LearningMar-12-2021

Digital twins are emerging in many industries, typically consisting of simulation models and data associated with a specific physical system. One of the main reasons for developing a digital twin, is to enable the simulation of possible consequences of a given action, without the need to interfere with the physical system itself. Physical systems of interest, and the environments they operate in, do not always behave deterministically. Moreover, information about the system and its environment is typically incomplete or imperfect. Probabilistic representations of systems and environments may therefore be called for, especially to support decisions in application areas where actions may have severe consequences. In this paper we introduce the probabilistic digital twin (PDT). We will start by discussing how epistemic uncertainty can be treated using measure theory, by modelling epistemic information via $\sigma$-algebras. Based on this, we give a formal definition of how epistemic uncertainty can be updated in a PDT. We then study the problem of optimal sequential decision making. That is, we consider the case where the outcome of each decision may inform the next. Within the PDT framework, we formulate this optimization problem. We discuss how this problem may be solved (at least in theory) via the maximum principle method or the dynamic programming principle. However, due to the curse of dimensionality, these methods are often not tractable in practice. To mend this, we propose a generic approximate solution using deep reinforcement learning together with neural networks defined on sets. We illustrate the method on a practical problem, considering optimal information gathering for the estimation of a failure probability.

epistemic uncertainty, information, sequential decision, (17 more...)

arXiv.org Machine Learning

2103.07405

Country:

Europe > Norway > Eastern Norway > Oslo (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Ali Asmari, PhD, Head of AI and Machine Learning at ULC Technologies – Interview Series

#artificialintelligenceMar-11-2021, 23:10:38 GMT

Reinforcement Learning, which is a reward-based system to train complicated devices without complicated reverse kinematic calculations.

ai and machine learning, ali asmari, interview series, (1 more...)

#artificialintelligence

Industry: Media > News (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)

Add feedback

Going Deep on Deep Learning

#artificialintelligenceMar-11-2021, 14:35:30 GMT

In a field with a tendency to jump from buzzword to buzzword, deep learning has proven to be a consistent driver of discovery and scientific progress -- but also a frequent source of confusion, and even trepidation. Well, TDS is here to help. A good place to start? This comprehensive introduction to Bayesian deep learning by Joris Baan, which he wrote with the explicit goal of bridging the gap between basic probability theory and cutting-edge research. You'd be hard-pressed to find a better article to read next than Lina Faik's patient explanation of deep Q learning, reinforcement learning, and how the two are powering new and exciting real-world applications.

buzzword, deep learning, workflow

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.79)

Add feedback

Maximum Entropy RL (Provably) Solves Some Robust RL Problems

#artificialintelligenceMar-11-2021, 05:50:52 GMT

Nearly all real-world applications of reinforcement learning involve some degree of shift between the training environment and the testing environment. However, prior work has observed that even small shifts in the environment cause most RL algorithms to perform markedly worse. As we aim to scale reinforcement learning algorithms and apply them in the real world, it is increasingly important to learn policies that are robust to changes in the environment. Broadly, prior approaches to handling distribution shift in RL aim to maximize performance in either the average case or the worst case. While these methods have been successfully applied to a number of areas (e.g., self-driving cars, robot locomotion and manipulation), their success rests critically on the design of the distribution of environments.

algorithm, maxent rl, rl algorithm, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.41)

Add feedback

Best and No.1 Introduction to Reinforcement Learning! - WriteX.today

#artificialintelligenceMar-11-2021, 04:10:34 GMT

Let's see some simple example which helps you to illustrate the reinforcement learning mechanism. Consider the scenario of teaching new tricks to your cat. There are three approaches to implement a Reinforcement Learning algorithm. In a value-based Reinforcement Learning method, you should try to maximize a value function V(s). In this method, the agent is expecting a long-term return of the current states under policy π.

learning, reinforcement, reinforcement learning, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Large Batch Simulation for Deep Reinforcement Learning

Shacklett, Brennan, Wijmans, Erik, Petrenko, Aleksei, Savva, Manolis, Batra, Dhruv, Koltun, Vladlen, Fatahalian, Kayvon

arXiv.org Artificial IntelligenceMar-11-2021

We accelerate deep reinforcement learning-based training in visually complex 3D environments by two orders of magnitude over prior work, realizing end-to-end training speeds of over 19,000 frames of experience per second on a single GPU and up to 72,000 frames per second on a single eight-GPU machine. The key idea of our approach is to design a 3D renderer and embodied navigation simulator around the principle of "batch simulation": accepting and executing large batches of requests simultaneously. Beyond exposing large amounts of work at once, batch simulation allows implementations to amortize in-memory storage of scene assets, rendering work, data loading, and synchronization costs across many simulation requests, dramatically improving the number of simulated agents per GPU and overall simulation throughput. To balance DNN inference and training costs with faster simulation, we also build a computationally efficient policy DNN that maintains high task performance, and modify training algorithms to maintain sample efficiency when training with large mini-batches. By combining batch simulation and DNN performance optimizations, we demonstrate that PointGoal navigation agents can be trained in complex 3D environments on a single GPU in 1.5 days to 97% of the accuracy of agents trained on a prior state-of-the-art system using a 64-GPU cluster over three days. We provide open-source reference implementations of our batch 3D renderer and simulator to facilitate incorporation of these ideas into RL systems. It is now common for modern reinforcement learning (RL) algorithms leveraging deep neural networks (DNNs) to require billions of samples of experience from simulated environments (Wijmans et al., 2020; Petrenko et al., 2020; OpenAI et al., 2019; Silver et al., 2017; Vinyals et al., 2019). For embodied AI tasks such as visual navigation, where the ultimate goal for learned policies is deployment in the real world, learning from realistic simulations is important for successful transfer of learned policies to physical robots. In these cases simulators must render detailed 3D scenes and simulate agent interaction with complex environments (Kolve et al., 2017; Dosovitskiy et al., 2017; Savva et al., 2019; Xia et al., 2020; Gan et al., 2020). Evaluating and training a DNN on billions of simulated samples is computationally expensive.

agent, batch size, ijman 20, (16 more...)

arXiv.org Artificial Intelligence

2103.07013

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Reinforcement Learning Based Approach to Play Calling in Football

Biro, Preston, Walker, Stephen G.

arXiv.org Machine LearningMar-11-2021

With the advances in computer power and the ability to both acquire and store huge quantities of data, so goes the corresponding advance of the machine (aka algorithm) to replace the human as a primary source of decision making. The number of successful applications is increasing at a rapid pace; in games, such as Chess and Go, medical imaging and diagnosing tumours, to automated driving, and even the selection of candidates for jobs. The notion of reinforcement learning is one key principle, whereby a game or set of decisions is studied and rewards recorded so a machine can learn long term benefits from local decisions, often negotiating a sequence of complex decisions. For example, Silver et al. (2017) discuss how a machine can become an expert at the game Go simply by playing against itself, with Bai and Jin (2020) looking at more general self-play algorithms.

information, probability, scenario, (17 more...)

arXiv.org Machine Learning

2103.06939

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

On Finite-Sample Analysis of Offline Reinforcement Learning with Deep ReLU Networks

Nguyen-Tang, Thanh, Gupta, Sunil, Tran-The, Hung, Venkatesh, Svetha

arXiv.org Machine LearningMar-11-2021

This paper studies the statistical theory of offline reinforcement learning with deep ReLU networks. We consider the off-policy evaluation (OPE) problem where the goal is to estimate the expected discounted reward of a target policy given the logged data generated by unknown behaviour policies. We study a regression-based fitted Q evaluation (FQE) method using deep ReLU networks and characterize a finite-sample bound on the estimation error of this method under mild assumptions. The prior works in OPE with either general function approximation or deep ReLU networks ignore the data-dependent structure in the algorithm, dodging the technical bottleneck of OPE, while requiring a rather restricted regularity assumption. In this work, we overcome these limitations and provide a comprehensive analysis of OPE with deep ReLU networks. In particular, we precisely quantify how the distribution shift of the offline data, the dimension of the input space, and the regularity of the system control the OPE estimation error. Consequently, we provide insights into the interplay between offline reinforcement learning and deep learning.

finite-sample analysis, function approximation, offline reinforcement learning, (11 more...)

arXiv.org Machine Learning

2103.06671

Country:

Oceania > Australia (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Discovering Diverse Solutions in Deep Reinforcement Learning

Osa, Takayuki, Tangkaratt, Voot, Sugiyama, Masashi

arXiv.org Artificial IntelligenceMar-11-2021

Reinforcement learning (RL) algorithms are typically limited to learning a single solution of a specified task, even though there often exists diverse solutions to a given task. Compared with learning a single solution, learning a set of diverse solutions is beneficial because diverse solutions enable robust few-shot adaptation and allow the user to select a preferred solution. Although previous studies have showed that diverse behaviors can be modeled with a policy conditioned on latent variables, an approach for modeling an infinite set of diverse solutions with continuous latent variables has not been investigated. In this study, we propose an RL method that can learn infinitely many solutions by training a policy conditioned on a continuous or discrete low-dimensional latent variable. Through continuous control tasks, we demonstrate that our method can learn diverse solutions in a data-efficient manner and that the solutions can be used for few-shot adaptation to solve unseen tasks.

continuous latent variable, latent variable, ltd3, (15 more...)

arXiv.org Artificial Intelligence

2103.07084

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Middle East > Jordan (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Adversarial attacks in consensus-based multi-agent reinforcement learning

Figura, Martin, Kosaraju, Krishna Chaitanya, Gupta, Vijay

arXiv.org Artificial IntelligenceMar-11-2021

Recently, many cooperative distributed multi-agent reinforcement learning (MARL) algorithms have been proposed in the literature. In this work, we study the effect of adversarial attacks on a network that employs a consensus-based MARL algorithm. We show that an adversarial agent can persuade all the other agents in the network to implement policies that optimize an objective that it desires. In this sense, the standard consensus-based MARL algorithms are fragile to attacks.

agent, algorithm, reinforcement, (16 more...)

arXiv.org Artificial Intelligence

2103.06967

Country: North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (0.85)
Government > Military (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback