AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Winning Blackjack using Machine Learning – Towards Data Science

#artificialintelligenceFeb-13-2019, 14:13:04 GMT

One of the great things about machine learning is that there are so many different approaches to solving problems. Neural networks are great for finding patterns in data, resulting in predictive capabilities that are truly impressive. Reinforcement learning uses rewards-based concepts, improving over time. And then there's the approach called a genetic algorithm. A genetic algorithm (GA) uses principles from evolution to solve problems.

blackjack, data science, genetic algorithm

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

Learning preferences by looking at the world

RobohubFeb-13-2019, 03:59:08 GMT

It would be great if we could all have household robots do our chores for us. Chores are tasks that we want done to make our houses cater more to our preferences; they are a way in which we want our house to be different from the way it currently is. However, most "different" states are not very desirable: Surely our robot wouldn't be so dumb as to go around breaking stuff when we ask it to clean our house? Unfortunately, AI systems trained with reinforcement learning only optimize features specified in the reward function and are indifferent to anything we might've inadvertently left out. Generally, it is easy to get the reward wrong by forgetting to include preferences for things that should stay the same, since we are so used to having these preferences satisfied, and there are so many of them.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Robohub

Technology:

Information Technology > Artificial Intelligence > Robots (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

AI guides single-camera drone through hallways it's never seen before

#artificialintelligenceFeb-13-2019, 03:27:06 GMT

Researchers at the University of Colorado recently demonstrated a system that helps robots figure out the direction of hiking trails from camera footage, and scientists at ETH Zurich described in a January paper a machine learning framework that aids four-legged robots in getting up from the ground when they trip and fall. But might such AI perform just as proficiently when applied to a drone rather than machines planted firmly on the ground? A team at the University of California at Berkeley set out to find out. In a newly published paper on the preprint server Arxiv ("Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight"), the team proposes a "hybrid" deep reinforcement learning algorithm that combines data from both a digital simulation and the real world to guide a quadcopter through carpeted corridors. "In this work, we … aim to devise a transfer learning algorithm where the physical behavior of the vehicle is learned," the paper's authors wrote. "In essence, real-world experience is used to learn how to fly, while simulated experience is used to learn how to generalize."

machine learning, real-world data, reinforcement learning, (13 more...)

#artificialintelligence

Country:

North America > United States > Colorado (0.26)
North America > United States > California (0.26)
Europe > Switzerland > Zürich > Zürich (0.26)

Genre: Research Report (0.37)

Industry: Transportation (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.40)

Add feedback

Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning

Chen, Gang, Peng, Yiming

arXiv.org Machine LearningFeb-13-2019

We propose a new policy iteration theory as an important extension of soft policy iteration and Soft Actor-Critic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary entropy measures that generalize Shannon entropy, such as Tsallis entropy and Renyi entropy, can be utilized to properly randomize action selection while fulfilling the goal of maximizing expected long-term rewards. Our theory gives birth to two new algorithms, i.e., Tsallis entropy Actor-Critic (TAC) and Renyi entropy Actor-Critic (RAC). Theoretical analysis shows that these algorithms can be more effective than SAC. Moreover, they pave the way for us to develop a new Ensemble Actor-Critic (EAC) algorithm in this paper that features the use of a bootstrap mechanism for deep environment exploration as well as a new value-function based mechanism for high-level action selection. Empirically we show that TAC, RAC and EAC can achieve state-of-the-art performance on a range of benchmark control tasks, outperforming SAC and several cutting-edge learning algorithms in terms of both sample efficiency and effectiveness.

actor-critic, algorithm, off-policy actor-critic, (14 more...)

arXiv.org Machine Learning

1902.05551

Country:

Asia > Middle East > Jordan (0.04)
Oceania > New Zealand (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On Reinforcement Learning Using Monte Carlo Tree Search with Supervised Learning: Non-Asymptotic Analysis

Shah, Devavrat, Xie, Qiaomin, Xu, Zhi

arXiv.org Machine LearningFeb-13-2019

Inspired by the success of AlphaGo Zero (AGZ) which utilizes Monte Carlo Tree Search (MCTS) with Supervised Learning via Neural Network to learn the optimal policy and value function, in this work, we focus on establishing formally that such an approach indeed finds optimal policy asymptotically, as well as establishing non-asymptotic guarantees in the process. We shall focus on infinite-horizon discounted Markov Decision Process to establish the results. To start with, it requires establishing the MCTS's claimed property in the literature that for any given query state, MCTS provides approximate value function for the state with enough simulation steps of MDP. We provide non-asymptotic analysis establishing this property by analyzing a non-stationary multi-arm bandit setup. Our proof suggests that MCTS needs to be utilized with polynomial rather than logarithmic "upper confidence bound" for establishing its desired performance -- interestingly enough, AGZ chooses such polynomial bound. Using this as a building block, combined with nearest neighbor supervised learning, we argue that MCTS acts as a "policy improvement" operator; it has a natural "bootstrapping" property to iteratively improve value function approximation for all states, due to combining with supervised learning, despite evaluating at only finitely many states. In effect, we establish that to learn $\varepsilon$ approximation of value function in $\ell_\infty$ norm, MCTS combined with nearest-neighbors requires samples scaling as $\widetilde{O}\big(\varepsilon^{-(d+4)}\big)$, where $d$ is the dimension of the state space. This is nearly optimal due to a minimax lower bound of $\widetilde{\Omega}\big(\varepsilon^{-(d+2)}\big).$

algorithm, mct, node, (14 more...)

arXiv.org Machine Learning

1902.05213

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Go (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Sample-Optimal Parametric Q-Learning with Linear Transition Models

Yang, Lin F., Wang, Mengdi

arXiv.org Machine LearningFeb-13-2019

Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process's probabilistic transition model. We propose a parametric Q-learning algorithm that finds an approximate-optimal policy using a sample size proportional to the feature dimension $K$ and invariant with respect to the size of the state space. To further improve its sample efficiency, we exploit the monotonicity property and intrinsic noise structure of the Bellman operator, provided the existence of anchor state-actions that imply implicit non-negativity in the feature space. We augment the algorithm using techniques of variance reduction, monotonicity preservation, and confidence bounds. It is proved to find a policy which is $\epsilon$-optimal from any initial state with high probability using $\widetilde{O}(K/\epsilon^2(1-\gamma)^3)$ sample transitions for arbitrarily large-scale MDP with a discount factor $\gamma\in(0,1)$. A matching information-theoretical lower bound is proved, confirming the sample optimality of the proposed method with respect to all parameters (up to polylog factors).

algorithm, iteration, probability, (13 more...)

arXiv.org Machine Learning

1902.04779

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero

Tian, Yuandong, Ma, Jerry, Gong, Qucheng, Sengupta, Shubho, Chen, Zhuoyuan, Pinkerton, James, Zitnick, C. Lawrence

arXiv.org Machine LearningFeb-13-2019

The AlphaGo, AlphaGo Zero, and AlphaZero series of algorithms are a remarkable demonstration of deep reinforcement learning's capabilities, achieving superhuman performance in the complex game of Go with progressively increasing autonomy. However, many obstacles remain in the understanding of and usability of these promising approaches by the research community. Toward elucidating unresolved mysteries and facilitating future research, we propose ELF OpenGo, an open-source reimplementation of the AlphaZero algorithm. ELF OpenGo is the first open-source Go AI to convincingly demonstrate superhuman performance with a perfect (20:0) record against global top professionals. We apply ELF OpenGo to conduct extensive ablation studies, and to identify and analyze numerous interesting phenomena in both the model training and in the gameplay inference procedures. Our code, models, selfplay datasets, and auxiliary data are publicly available.

analysis and open reimplementation, elf opengo, prototype model, (12 more...)

arXiv.org Machine Learning

1902.04522

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Leisure & Entertainment > Games > Go (1.00)
Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Games > Go (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Sample Complexity of Estimating the Policy Gradient for Nearly Deterministic Dynamical Systems

Bastani, Osbert

arXiv.org Machine LearningFeb-13-2019

Reinforcement learning is a promising approach to learning robot controllers. It has recently been shown that algorithms based on finite-difference estimates of the policy gradient are competitive with algorithms based on the policy gradient theorem. We propose a theoretical framework for understanding this phenomenon. Our key insight is that many dynamical systems (especially those of interest in robot control tasks) are \emph{nearly deterministic}---i.e., they can be modeled as a deterministic system with a small stochastic perturbation. We show that for such systems, finite-difference estimates of the policy gradient can have substantially lower variance than estimates based on the policy gradient theorem. We interpret these results in the context of counterfactual estimation. Finally, we empirically evaluate our insights in an experiment on the inverted pendulum.

algorithm, policy gradient, sample complexity, (7 more...)

arXiv.org Machine Learning

1901.08562

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Understanding The Impact of Partner Choice on Cooperation and Social Norms by means of Multi-agent Reinforcement Learning

Anastassacos, Nicolas, Hailes, Steve, Musolesi, Mirco

arXiv.org Artificial IntelligenceFeb-13-2019

The human ability to coordinate and cooperate has been vital to the development of societies for thousands of years. While it is not fully clear how this behavior arises, social norms are thought to be a key factor in this development. In contrast to laws set by authorities, norms tend to evolve in a bottom-up manner from interactions between members of a society. While much behavior can be explained through the use of social norms, it is difficult to measure the extent to which they shape society as well as how they are affected by other societal dynamics. In this paper, we discuss the design and evaluation of a reinforcement learning model for understanding how the opportunity to choose who you interact with in a society affects the overall societal outcome and the strength of social norms. We first study the emergence of norms and then the emergence of cooperation in presence of norms. In our model, agents interact with other agents in a society in the form of repeated matrix-games: coordination games and cooperation games. In particular, in our model, at each each stage, agents are either able to choose a partner to interact with or are forced to interact at random and learn using policy gradients.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1902.03185

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Global Big Data Conference

#artificialintelligenceFeb-12-2019, 19:21:03 GMT

The capability of a machine to "learn" on its own is the subject of some debate. With traditional supervised machine learning, decisions can be optimized, but the machine isn't really learning by itself. Now a startup called Cogitai is hoping to push the limits of a machine's capability to learn continuously using reinforcement learning techniques. Cogitai was founded in 2015 by some of the earliest innovators in the reinforcement learning (RL) field, including Mark Ring, Peter Stone, and Pete Wurman. The Orange County, California is hoping to leverage the collective RL knowledge work of its founders and the 15 or so PhD computer scientists in the firm to change the course of AI applications.

cogitai, machine learning, reinforcement learning, (14 more...)

#artificialintelligence

Country: North America > United States > California > Orange County (0.26)

Industry: Information Technology (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)
Information Technology > Data Science > Data Mining > Big Data (0.40)

Add feedback