AITopics

2007.07206

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Illinois (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Artificial IntelligenceJul-28-2020

Lifelong Incremental Reinforcement Learning with Online Bayesian Inference

Wang, Zhi, Chen, Chunlin, Dong, Daoyi

A central capability of a long-lived reinforcement learning (RL) agent is to incrementally adapt its behavior as its environment changes, and to incrementally build upon previous experiences to facilitate future learning in real-world scenarios. In this paper, we propose LifeLong Incremental Reinforcement Learning (LLIRL), a new incremental algorithm for efficient lifelong adaptation to dynamic environments. We develop and maintain a library that contains an infinite mixture of parameterized environment models, which is equivalent to clustering environment parameters in a latent space. The prior distribution over the mixture is formulated as a Chinese restaurant process (CRP), which incrementally instantiates new environment models without any external information to signal environmental changes in advance. During lifelong learning, we employ the expectation maximization (EM) algorithm with online Bayesian inference to update the mixture in a fully incremental manner. In EM, the E-step involves estimating the posterior expectation of environment-to-cluster assignments, while the M-step updates the environment parameters for future learning. This method allows for all environment models to be adapted as necessary, with new models instantiated for environmental changes and old models retrieved when previously seen environments are encountered again. Experiments demonstrate that LLIRL outperforms relevant existing methods, and enables effective incremental adaptation to various dynamic environments for lifelong learning.

dynamic environment, machine learning, reinforcement learning, (16 more...)

2007.14196

Country:

Oceania > Australia > New South Wales (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre:

Research Report (0.82)
Instructional Material (0.56)

Industry: Education > Educational Setting (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Okada, Masashi, Taniguchi, Tadahiro

Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction

arXiv.org Artificial IntelligenceJul-28-2020

In the present paper, we propose a decoder-free extension of Dreamer, a leading model-based reinforcement learning (MBRL) method from pixels. Dreamer is a sample- and cost-efficient solution to robot learning, as it is used to train latent state-space models based on a variational autoencoder and to conduct policy optimization by latent trajectory imagination. However, this autoencoding based approach often causes object vanishing, in which the autoencoder fails to perceives key objects for solving control tasks, and thus significantly limiting Dreamer's potential. This work aims to relieve this Dreamer's bottleneck and enhance its performance by means of removing the decoder. For this purpose, we firstly derive a likelihood-free and InfoMax objective of contrastive learning from the evidence lower bound of Dreamer. Secondly, we incorporate two components, (i) independent linear dynamics and (ii) the random crop data augmentation, to the learning scheme so as to improve the training performance. In comparison to Dreamer and other recent model-free reinforcement learning methods, our newly devised Dreamer with InfoMax and without generative decoder (Dreaming) achieves the best scores on 5 difficult simulated robotics tasks, in which Dreamer suffers from object vanishing.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2007.14535

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

#artificialintelligenceJul-27-2020, 23:25:10 GMT

DeepMind's Newest AI Programs Itself to Make All the Right Decisions

Three main deep learning approaches are supervised, unsupervised, and reinforcement learning. The first two consume huge amounts of data (like images or articles), look for patterns in the data, and use those patterns to inform actions (like identifying an image of a cat). To us, this is a pretty alien way to learn about the world. Not only would it be mind-numbingly dull to review millions of cat images, it'd take us years or more to do what these programs do in hours or days. And of course, we can learn what a cat looks like from just a few examples.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.82)

#artificialintelligenceJul-27-2020, 19:46:56 GMT

Coaching in 2030: How Artificial Intelligence Will Change Our Profession - SimpliFaster

Simply put, for the last 200 years, advisers have worked on the principle of information asymmetry, where they have better information than their clients. Today, we are at the point where machine intelligence is gaining information asymmetry over advisers, and that's only going to get more acute and asymmetrical as time goes on. The only possible hope for human advisers is that they co-opt machine intelligence into their process.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

#artificialintelligence

Country:

Oceania > Australia (0.04)
North America > United States > Florida > Orange County (0.04)

Industry:

Leisure & Entertainment > Sports (0.70)
Leisure & Entertainment > Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

#artificialintelligenceJul-27-2020, 12:45:42 GMT

Pluralsight to Help Machine Learning Enthusiasts Skill Up with AWS DeepRacer

First announced in November 2018, AWS DeepRacer is a fun and interesting way to get rolling with reinforcement learning (RL), literally, with the fully …

artificial intelligence, help machine learning enthusiast skill, reinforcement learning, (2 more...)

#artificialintelligence

Industry: Media > News (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.52)

Pierrot, Thomas, Perrin, Nicolas, Behbahani, Feryal, Laterre, Alexandre, Sigaud, Olivier, Beguir, Karim, de Freitas, Nando

Learning Compositional Neural Programs for Continuous Control

We propose a novel solution to challenging sparse-reward, continuous control problems that require hierarchical planning at multiple levels of abstraction. Our solution, dubbed AlphaNPI-X, involves three separate stages of learning. First, we use off-policy reinforcement learning algorithms with experience replay to learn a set of atomic goal-conditioned policies, which can be easily repurposed for many tasks. Second, we learn self-models describing the effect of the atomic policies on the environment. Third, the self-models are harnessed to learn recursive compositional programs with multiple levels of abstraction. The key insight is that the self-models enable planning by imagination, obviating the need for interaction with the world when learning higher-level compositional programs. To accomplish the third stage of learning, we extend the AlphaNPI algorithm, which applies AlphaZero to learn recursive neural programmer-interpreters. We empirically show that AlphaNPI-X can effectively learn to tackle challenging sparse manipulation tasks, such as stacking multiple blocks, where powerful model-free baselines fail.

arxiv preprint arxiv, deep learning, neural network, (18 more...)

2007.13363

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Energy > Oil & Gas (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Greedy Bandits with Sampled Context

Huh, Dom

Bayesian strategies for contextual bandits have proved promising in single-state reinforcement learning tasks by modeling uncertainty using context information from the environment. In this paper, we propose Greedy Bandits with Sampled Context (GB-SC), a method for contextual multi-armed bandits to develop the prior from the context information using Thompson Sampling, and arm selection using an epsilon-greedy policy. The framework GB-SC allows for evaluation of context-reward dependency, as well as providing robustness for partially observable context vectors by leveraging the prior developed. Our experimental results show competitive performance on the Mushroom environment in terms of expected regret and expected cumulative regret, as well as insights on how each context subset affects decision-making.

data mining, machine learning, reinforcement learning, (16 more...)

2007.16001

Country:

North America > United States > Virginia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

Brown, Noam, Bakhtin, Anton, Lerer, Adam, Gong, Qucheng

The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of a successes in single-agent settings and perfect-information games, best exemplified by the success of AlphaZero. However, algorithms of this form have been unable to cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search for imperfect-information games. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results show ReBeL leads to low exploitability in benchmark imperfect-information games and achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI. We also prove that ReBeL converges to a Nash equilibrium in two-player zero-sum games in tabular settings.

machine learning, reinforcement learning, subgame, (20 more...)

2007.13544

Country:

North America > United States > Texas (0.25)
North America > United States > Rhode Island (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Poker (0.88)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders

Bennett, Andrew, Kallus, Nathan, Li, Lihong, Mousavi, Ali

A fundamental question in offline reinforcement learning (RL) is how to estimate the value of some target evaluation policy, defined as the long-run average reward obtained by following the policy, using data logged by running a different behavior policy. This question, known as off-policy evaluation (OPE), often arises in applications such as healthcare, education, or robotics, where experimenting with running the target policy can be expensive or even impossible, but we have data logged following business as usual or current standards of care. A central concern using such passively observed data is that observed actions, rewards, and transitions may be confounded by unobserved variables, which can bias standard OPE methods that assume no unobserved confounders, or equivalently that a standard Markov decision process (MDP) model holds with fully observed state. Consider for example evaluating a new smart-phone app to help people living with type-1 diabetes time their insulin injections by monitoring their blood glucose level using some wearable device. Rather than risking giving bad advice that may harm individuals, we may consider first evaluating our injection-timing policy using existing longitudinal observations of individuals' blood glucose levels over time and the timing of insulin injections.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

2007.13893

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)