AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Fast Online Exact Solutions for Deterministic MDPs with Sparse Rewards

Bertram, Joshua R., Yang, Xuxi, Wei, Peng

arXiv.org Machine LearningMay-7-2018

Markov Decision Processes (MDPs) are a mathematical framework for modeling sequential decision making under uncertainty. The classical approaches for solving MDPs are well known and have been widely studied, some of which rely on approximation techniques to solve MDPs with large state space and/or action space. However, most of these classical solution approaches and their approximation techniques still take much computation time to converge and usually must be re-computed if the reward function is changed. This paper introduces a novel alternative approach for exactly and efficiently solving deterministic, continuous MDPs with sparse reward sources. When the environment is such that the "distance" between states can be determined in constant time, e.g. grid world, our algorithm offers $O( |R|^2 \times |A|^2 \times |S|)$, where $|R|$ is the number of reward sources, $|A|$ is the number of actions, and $|S|$ is the number of states. Memory complexity for the algorithm is $O( |S| + |R| \times |A|)$. This new approach opens new avenues for boosting computational performance for certain classes of MDPs and is of tremendous value for MDP applications such as robotics and unmanned systems. This paper describes the algorithm and presents numerical experiment results to demonstrate its powerful computational performance. We also provide rigorous mathematical description of the approach.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1805.02785

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Add feedback

A Reinforcement Learning Approach to Interactive-Predictive Neural Machine Translation

Lam, Tsz Kin, Kreutzer, Julia, Riezler, Stefan

arXiv.org Machine LearningMay-7-2018

We present an approach to interactive-predictive neural machine translation that attempts to reduce human effort from three directions: Firstly, instead of requiring humans to select, correct, or delete segments, we employ the idea of learning from human reinforcements in form of judgments on the quality of partial translations. Secondly, human effort is further reduced by using the entropy of word predictions as uncertainty criterion to trigger feedback requests. Lastly, online updates of the model parameters after every interaction allow the model to adapt quickly. We show in simulation experiments that reward signals on partial translations significantly improve character F-score and BLEU compared to feedback on full translations only, while human effort can be reduced to an average number of $5$ feedback requests for every input.

machine learning, reinforcement learning, translation, (17 more...)

arXiv.org Machine Learning

1805.01553

Country:

Europe (1.00)
Asia > Middle East (0.68)
North America > United States > New York (0.28)
North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge

Song, Doo Re, Yang, Chuanyu, McGreavy, Christopher, Li, Zhibin

arXiv.org Artificial IntelligenceMay-6-2018

This paper presents a deep learning framework that is capable of solving partially observable locomotion tasks based on our novel Recurrent Deterministic Policy Gradient (RDPG). Three major improvements are applied in our RDPG based learning framework: asynchronized backup of interpolated temporal difference, initialisation of hidden state using past trajectory scanning, and injection of external experiences learned by other agents. The proposed learning framework was implemented to solve the Bipedal-Walker challenge in OpenAI's gym simulation environment where only partial state information is available. Our simulation study shows that the autonomous behaviors generated by the RDPG agent are highly adaptive to a variety of obstacles and enables the agent to traverse rugged terrains effectively.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1710.02896

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback

Developing parsimonious ensembles using ensemble diversity within a reinforcement learning framework

Stanescu, Ana, Pandey, Gaurav

arXiv.org Machine LearningMay-5-2018

Heterogeneous ensembles built from the predictions of a wide variety and large number of diverse base predictors represent a potent approach to building predictive models for problems where the ideal base/individual predictor may not be obvious. Ensemble selection is an especially promising approach here, not only for improving prediction performance, but also because of its ability to select a collectively predictive subset, often a relatively small one, of the base predictors. In this paper, we present a set of algorithms that explicitly incorporate ensemble diversity, a known factor influencing predictive performance of ensembles, into a reinforcement learning framework for ensemble selection. We rigorously tested these approaches on several challenging problems and associated data sets, yielding that several of them produced more accurate ensembles than those that don't explicitly consider diversity. More importantly, these diversity-incorporating ensembles were much smaller in size, i.e., more parsimonious, than the latter types of ensembles. This can eventually aid the interpretation or reverse engineering of predictive models assimilated into the resultant ensemble(s).

data mining, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1805.02103

Country: North America > United States (0.68)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Reinforcement Learning for Playing 2.5D Fighting Games

Li, Yu-Jhe, Chang, Hsin-Yu, Lin, Yu-Jing, Wu, Po-Wei, Wang, Yu-Chiang Frank

arXiv.org Machine LearningMay-5-2018

Deep reinforcement learning has shown its success in game playing. However, 2.5D fighting games would be a challenging task to handle due to ambiguity in visual appearances like height or depth of the characters. Moreover, actions in such games typically involve particular sequential action orders, which also makes the network design very difficult. Based on the network of Asynchronous Advantage Actor-Critic (A3C), we create an OpenAI-gym-like gaming environment with the game of Little Fighter 2 (LF2), and present a novel A3C+ network for learning RL agents. The introduced model includes a Recurrent Info network, which utilizes game-related info features with recurrent layers to observe combo skills for fighting. In the experiments, we consider LF2 in different settings, which successfully demonstrates the use of our proposed model for learning 2.5D fighting games.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1805.0207

Genre: Research Report (0.83)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

How I build an AI to play Dino Run – Acing AI – Medium

#artificialintelligenceMay-4-2018, 09:01:46 GMT

A 2013 publication by DeepMind titled'Playing Atari with Deep Reinforcement Learning' introduced a new deep learning model on similar lines for reinforcement learning, and demonstrated its ability to master difficult control policies for Atari 2600 computer games, using only raw pixels as input. My project was inspired from few implementations of this paper. I will try to explain the basics of Reinforcement Learning and dive deep into the code snippets for hands on understanding. Before we begin, as a prerequisite, I'm assuming you have basic knowledge of Deep Supervised Learning and Convolutional Neural Networks which are essential for understanding the project. Feel free to skip to code section if you're familiar with Reinforcement Learning and Q-Learning .

artificial intelligence, machine learning, reinforcement learning, (8 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Facebook Open Sources ELF OpenGo

#artificialintelligenceMay-4-2018, 07:16:51 GMT

Today, Facebook AI Research (FAIR) open sourced ELF OpenGo, an AI bot that has defeated world champion professional Go players, based on our existing ELF platform for Reinforcement Learning Research. We are releasing both the trained model and the code used to create it. Inspired by DeepMind's work, we kicked off an effort earlier this year to reproduce their recent AlphaGoZero results using FAIR's Extensible, Lightweight Framework (ELF) for reinforcement learning research. The goal was to create an open source implementation of a system that would teach itself how to play Go at the level of a professional human player or better. By releasing our code and models we hoped to inspire others to think about new applications and research directions for this technology.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.82)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

Add feedback

Socially Aware Motion Planning with Deep Reinforcement Learning

Chen, Yu Fan, Everett, Michael, Liu, Miao, How, Jonathan P.

arXiv.org Artificial IntelligenceMay-4-2018

For robotic vehicles to navigate safely and efficiently in pedestrian-rich environments, it is important to model subtle human behaviors and navigation rules (e.g., passing on the right). However, while instinctive to humans, socially compliant navigation is still difficult to quantify due to the stochasticity in people's behaviors. Existing works are mostly focused on using feature-matching techniques to describe and imitate human paths, but often do not generalize well since the feature values can vary from person to person, and even run to run. This work notes that while it is challenging to directly specify the details of what to do (precise mechanisms of human navigation), it is straightforward to specify what not to do (violations of social norms). Specifically, using deep reinforcement learning, this work develops a time-efficient navigation policy that respects common social norms. The proposed method is shown to enable fully autonomous navigation of a robotic vehicle moving at human walking speed in an environment with many pedestrians.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1703.08862

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.64)

Industry: Automobiles & Trucks (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning

Everett, Michael, Chen, Yu Fan, How, Jonathan P.

arXiv.org Artificial IntelligenceMay-4-2018

Robots that navigate among pedestrians use collision avoidance algorithms to enable safe and efficient operation. Recent works present deep reinforcement learning as a framework to model the complex interactions and cooperation. However, they are implemented using key assumptions about other agents' behavior that deviate from reality as the number of agents in the environment increases. This work extends our previous approach to develop an algorithm that learns collision avoidance among a variety of types of dynamic agents without assuming they follow any particular behavior rules. This work also introduces a strategy using LSTM that enables the algorithm to use observations of an arbitrary number of other agents, instead of previous methods that have a fixed observation size. The proposed algorithm outperforms our previous approach in simulation as the number of agents increases, and the algorithm is demonstrated on a fully autonomous robotic vehicle traveling at human walking speed, without the use of a 3D Lidar.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1805.01956

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.89)

Add feedback

Exploration by Distributional Reinforcement Learning

Tang, Yunhao, Agrawal, Shipra

arXiv.org Machine LearningMay-4-2018

We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning. We show that our proposed framework conceptually unifies multiple previous methods in exploration. We also derive a practical algorithm that achieves efficient exploration on challenging control tasks.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1805.01907

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback