AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

A technique to improve machine learning inspired by the behavior of human infants

#artificialintelligenceJul-19-2019, 11:14:48 GMT

From their first years of life, human beings have the innate ability to learn continuously and build mental models of the world, simply by observing and interacting with things or people in their surroundings. Cognitive psychology studies suggest that humans make extensive use of this previously acquired knowledge, particularly when they encounter new situations or when making decisions. Despite the significant recent advances in the field of artificial intelligence (AI), most virtual agents still require hundreds of hours of training to achieve human-level performance in several tasks, while humans can learn how to complete these tasks in a few hours or less. Recent studies have highlighted two key contributors to humans' ability to acquire knowledge so quickly--namely, intuitive physics and intuitive psychology. These intuition models, which have been observed in humans from early stages of development, might be the core facilitators of future learning.

human infant, machine learning, reinforcement learning, (7 more...)

#artificialintelligence

Genre: Research Report (0.76)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

An Actor-Critic-Attention Mechanism for Deep Reinforcement Learning in Multi-view Environments

Barati, Elaheh, Chen, Xuewen

arXiv.org Machine LearningJul-19-2019

In reinforcement learning algorithms, leveraging multiple views of the environment can improve the learning of complicated policies. In multi-view environments, due to the fact that the views may frequently suffer from partial observability, their level of importance are often different. In this paper, we propose a deep reinforcement learning method and an attention mechanism in a multi-view environment. Each view can provide various representative information about the environment. Through our attention mechanism, our method generates a single feature representation of environment given its multiple views. It learns a policy to dynamically attend to each view based on its importance in the decision-making process. Through experiments, we show that our method outperforms its state-of-the-art baselines on TORCS racing car simulator and three other complex 3D environments with obstacles. We also provide experimental results to evaluate the performance of our method on noisy conditions and partial observation settings.

adrl, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1907.09466

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Sports > Motorsports (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

GPU-Accelerated Atari Emulation for Reinforcement Learning

Dalton, Steven, Frosio, Iuri, Garland, Michael

arXiv.org Machine LearningJul-19-2019

We designed and implemented a CUDA port of the Atari Learning Environment (ALE), a system for developing and evaluating deep reinforcement algorithms using Atari games. Our CUDA Learning Environment (CuLE) overcomes many limitations of existing CPUbased Atari emulators and scales naturally to multi-GPU systems. It leverages the parallelization capability of GPUs to run thousands of Atari games simultaneously; by rendering frames directly on the GPU, CuLE avoids the bottleneck arising from the limited CPU-GPU communication bandwidth. Figure 1: In a typical DRL system, environments run As a result, CuLE is able to generate between 40M on CPUs, whereas GPUs execute DNN operations. The and 190M frames per hour using a single GPU, a finding limited CPU-GPU communication bandwidth and small that could be previously achieved only through a cluster set of CPU environments prevent full GPU utilization. of CPUs. We demonstrate the advantages of CuLE by effectively training agents with traditional deep reinforcement learning algorithms and measuring the utilization benchmark for DRL [4, 14], and still represent a challenging and throughput of the GPU. Our analysis further highlights set for the development of new DRL methods.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1907.08467

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Delegative Reinforcement Learning: learning to avoid traps with a little help

Kosoy, Vanessa

arXiv.org Machine LearningJul-19-2019

Most known regret bounds for reinforcement learning are either episodic or assume an environment without traps. We derive a regret bound without making either assumption, by allowing the algorithm to occasionally delegate an action to an external advisor. We thus arrive at a setting of active one-shot model-based reinforcement learning that we call DRL (delegative reinforcement learning.) The algorithm we construct in order to demonstrate the regret bound is a variant of Posterior Sampling Reinforcement Learning supplemented by a subroutine that decides which actions should be delegated. The algorithm is not anytime, since the parameters must be adjusted according to the target time discount. Currently, our analysis is limited to Markov decision processes with finite numbers of hypotheses, states and actions.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1907.08461

Country: North America > United States (0.93)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Prioritized Guidance for Efficient Multi-Agent Reinforcement Learning Exploration

Wang, Qisheng, Wang, Qichao

arXiv.org Machine LearningJul-19-2019

Exploration efficiency is a challenging problem in multi-agent reinforcement learning (MARL), as the policy learned by confederate MARL depends on the collaborative approach among multiple agents. Another important problem is the less informative reward restricts the learning speed of MARL compared with the informative label in supervised learning. In this work, we leverage on a novel communication method to guide MARL to accelerate exploration and propose a predictive network to forecast the reward of current state-action pair and use the guidance learned by the predictive network to modify the reward function. An improved prioritized experience replay is employed to better take advantage of the different knowledge learned by different agents which utilizes Time-difference (TD) error more effectively. Experimental results demonstrates that the proposed algorithm outperforms existing methods in cooperative multi-agent environments. We remark that this algorithm can be extended to supervised learning to speed up its training.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1907.07847

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)

Add feedback

Combinatorial Keyword Recommendations for Sponsored Search with Deep Reinforcement Learning

Li, Zhipeng, Wu, Jianwei, Sun, Lin, Rong, Tao

arXiv.org Machine LearningJul-18-2019

In sponsored search, keyword recommendations help advertisers to achieve much better performance within limited budget. Many works have been done to mine numerous candidate keywords from search logs or landing pages. However, the strategy to select from given candidates remains to be improved. The existing relevance-based, popularity-based and regular combinatorial strategies fail to take the internal or external competitions among keywords into consideration. In this paper, we regard keyword recommendations as a combinatorial optimization problem and solve it with a modified pointer network structure. The model is trained on an actor-critic based deep reinforcement learning framework. A pre-clustering method called Equal Size K-Means is proposed to accelerate the training and testing procedure on the framework by reducing the action space. The performance of framework is evaluated both in offline and online environments, and remarkable improvements can be observed.

advertiser, competition, keyword, (13 more...)

arXiv.org Machine Learning

1907.08686

Country:

North America > United States > Alaska > Anchorage Municipality > Anchorage (0.05)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Industry: Marketing (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

An intelligent financial portfolio trading strategy using deep Q-learning

Park, Hyungjun, Sim, Min Kyu, Choi, Dong Gu

arXiv.org Artificial IntelligenceJul-18-2019

A goal of financial portfolio trading is maximizing the trader's utility by allocating capital to assets in a portfolio in the investment horizon. Our study suggests an approach for deriving an intelligent portfolio trading strategy using deep Q-learning. In this approach, we introduce a Markov decision process model to enable an agent to learn about the financial environment and develop a deep neural network structure to approximate a Q-function. In addition, we devise three techniques to derive a trading strategy that chooses reasonable actions and is applicable to the real world. First, the action space of the learning agent is modeled as an intuitive set of trading directions that can be carried out for individual assets in the portfolio. Second, we introduce a mapping function that can replace an infeasible agent action in each state with a similar and valuable action to derive a reasonable trading strategy. Last, we introduce a method by which an agent simulates all feasible actions and learns about these experiences to utilize the training data efficiently. To validate our approach, we conduct backtests for two representative portfolios, and we find that the intelligent strategy derived using our approach is superior to the benchmark strategies.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

1907.03665

Genre: Research Report > New Finding (0.93)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dynamical Distance Learning for Unsupervised and Semi-Supervised Skill Discovery

Hartikainen, Kristian, Geng, Xinyang, Haarnoja, Tuomas, Levine, Sergey

arXiv.org Artificial IntelligenceJul-18-2019

Reinforcement learning requires manual specification of a reward function to learn a task. While in principle this reward function only needs to specify the task goal, in practice reinforcement learning can be very time-consuming or even infeasible unless the reward function is shaped so as to provide a smooth gradient towards a successful outcome. This shaping is difficult to specify by hand, particularly when the task is learned from raw observations, such as images. In this paper, we study how we can automatically learn dynamical distances: a measure of the expected number of time steps to reach a given goal state from any other state. These dynamical distances can be used to provide well-shaped reward functions for reaching new goals, making it possible to learn complex tasks efficiently. We also show that dynamical distances can be used in a semi-supervised regime, where unsupervised interaction with the environment is used to learn the dynamical distances, while a small amount of preference supervision is used to determine the task goal, without any manually engineered reward function or goal examples. We evaluate our method both in simulation and on a real-world robot. We show that our method can learn locomotion skills in simulation without any supervision. We also show that it can learn to turn a valve with a real-world 9-DoF hand, using raw image observations and ten preference labels, without any other supervision. Videos of the learned skills can be found on the project website: https://sites.google.com/view/skills-via-distance-learning.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1907.08225

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.62)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Credit Assignment as a Proxy for Transfer in Reinforcement Learning

Ferret, Johan, Marinier, Raphaël, Geist, Matthieu, Pietquin, Olivier

arXiv.org Artificial IntelligenceJul-18-2019

The ability to transfer representations to novel environments and tasks is a sensible requirement for general learning agents. Despite the apparent promises, transfer in Reinforcement Learning is still an open and under-exploited research area. In this paper, we suggest that credit assignment, regarded as a supervised learning task, could be used to accomplish transfer. Our contribution is twofold: we introduce a new credit assignment mechanism based on self-attention, and show that the learned credit can be transferred to in-domain and out-of-domain scenarios.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1907.08027

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Transfer Learning Across Simulated Robots With Different Sensors

Plisnier, Hélène, Steckelmacher, Denis, Roijers, Diederik, Nowé, Ann

arXiv.org Artificial IntelligenceJul-18-2019

For a robot to learn a good policy, it often requires expensive equipment (such as sophisticated sensors) and a prepared training environment conducive to learning. However, it is seldom possible to perfectly equip robots for economic reasons, nor to guarantee ideal learning conditions, when deployed in real-life environments. A solution would be to prepare the robot in the lab environment, when all necessary material is available to learn a good policy. After training in the lab, the robot should be able to get by without the expensive equipment that used to be available to it, and yet still be guaranteed to perform well on the field. The transition between the lab (source) and the real-world environment (target) is related to transfer learning, where the state-space between the source and target tasks differ. We tackle a simulated task with continuous states and discrete actions presenting this challenge, using Bootstrapped Dual Policy Iteration, a model-free actor-critic reinforcement learning algorithm, and Policy Shaping. Specifically, we train a BDPI agent, embodied by a virtual robot performing a task in the V-Rep simulator, sensing its environment through several proximity sensors. The resulting policy is then used by a second agent learning the same task in the same environment, but with camera images as input. The goal is to obtain a policy able to perform the task relying on merely camera images.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1907.07958

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback