AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

Shi, Wenjie, Song, Shiji, Wu, Cheng, Chen, C. L. Philip

arXiv.org Artificial IntelligenceSep-7-2019

This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor-critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively. Specifically, for the critics, the expected absolute Bellman error based updating rule is used to choose the worst critic to be updated in each time step. Subsequently, to calculate the loss function with more accurate target value for the chosen critic, Pseudo Q-learning, which uses sub-greedy policy to replace the greedy policy in Q-learning, is developed for continuous action spaces, and Multi Pseudo Q-learning (MPQ) is proposed to reduce the overestimation of action-value function and to stabilize the learning. As for the actors, deterministic policy gradient is applied to update the weights, and the final learned policy is defined as the average of all actors to avoid large but bad updates. Moreover, the stability analysis of the learning is given qualitatively. The effectiveness and generality of the proposed MPQ-based Deterministic Policy Gradient (MPQ-DPG) algorithm are verified by the application on AUV with two different reference trajectories. And the results demonstrate high-level tracking control accuracy and stable learning of MPQ-DPG. Besides, the results also validate that increasing the number of the actors and critics will further improve the performance.

machine learning, reinforcement learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

1909.03204

Country:

North America > United States (0.93)
Asia (0.69)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning

Shi, Wenjie, Song, Shiji, Wu, Cheng

arXiv.org Artificial IntelligenceSep-7-2019

Maximum entropy deep reinforcement learning (RL) methods have been demonstrated on a range of challenging continuous tasks. However, existing methods either suffer from severe instability when training on large off-policy data or cannot scale to tasks with very high state and action dimensionality such as 3D humanoid locomotion. Besides, the optimality of desired Boltzmann policy set for non-optimal soft value function is not persuasive enough. In this paper, we first derive soft policy gradient based on entropy regularized expected reward objective for RL with continuous actions. Then, we present an off-policy actor-critic, model-free maximum entropy deep RL algorithm called deep soft policy gradient (DSPG) by combining soft policy gradient with soft Bellman equation. To ensure stable learning while eliminating the need of two separate critics for soft value functions, we leverage double sampling approach to making the soft Bellman equation tractable. The experimental results demonstrate that our method outperforms in performance over off-policy prior methods.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

arXiv.org Artificial Intelligence

1909.03198

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.84)

Add feedback

Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information

Zhao, Yiren, Shumailov, Ilia, Cui, Han, Gao, Xitong, Mullins, Robert, Anderson, Ross

arXiv.org Machine LearningSep-6-2019

Recent research on reinforcement learning has shown that trained agents are vulnerable to maliciously crafted adversarial samples. In this work, we show how adversarial samples against RL agents can be generalised from White-box and Grey-box attacks to a strong Black-box case, namely where the attacker has no knowledge of the agents and their training methods. We use sequence-to-sequence models to predict a single action or a sequence of future actions that a trained agent will make. Our approximation model, based on time-series information from the agent, successfully predicts agents' future actions with consistently above 80% accuracy on a wide range of games and training methods. Second, we find that although such adversarial samples are transferable, they do not outperform random Gaussian noise as a means of reducing the game scores of trained RL agents. This highlights a serious methodological deficiency in previous work on such agents; random jamming should have been taken as the baseline for evaluation. Third, we do find a novel use for adversarial samples in this context: they can be used to trigger a trained agent to misbehave after a specific delay. This appears to be a genuinely new type of attack; it potentially enables an attacker to use devices controlled by RL agents as time bombs.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1909.02918

Genre: Research Report > New Finding (0.48)

Industry:

Leisure & Entertainment > Games (1.00)
Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Gradient Q$(\sigma, \lambda)$: A Unified Algorithm with Function Approximation for Reinforcement Learning

Yang, Long, Zhang, Yu, Zheng, Qian, Li, Pengfei, Pan, Gang

arXiv.org Machine LearningSep-6-2019

Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning. Q$(\sigma,\lambda)$ is the first approach unifies them with eligibility trace through the sampling degree $\sigma$. However, it is limited to the tabular case, for large-scale learning, the Q$(\sigma,\lambda)$ is too expensive to require a huge volume of tables to accurately storage value functions. To address above problem, we propose a GQ$(\sigma,\lambda)$ that extends tabular Q$(\sigma,\lambda)$ with linear function approximation. We prove the convergence of GQ$(\sigma,\lambda)$. Empirical results on some standard domains show that GQ$(\sigma,\lambda)$ with a combination of full-sampling with pure-expectation reach a better performance than full-sampling and pure-expectation methods.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1909.02877

Country: North America > Canada > Alberta (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.62)

Add feedback

DRLViz: Understanding Decisions and Memory in Deep Reinforcement Learning

Jaunet, Theo, Vuillemot, Romain, Wolf, Christian

arXiv.org Artificial IntelligenceSep-6-2019

We present DRLViz, a visual analytics interface to interpret the internal memory of an agent (e.g. a robot) trained using deep reinforcement learning. This memory is composed of large temporal vectors updated when the agent moves in an environment and is not trivial to understand. It is often referred to as a black box as only inputs (images) and outputs (actions) are intelligible for humans. Using DRLViz, experts are assisted to interpret using memory reduction interactions, to investigate parts of the memory role when errors have been made, and ultimately to improve the agent training process. We report on several examples of use of DRLViz, in the context of video games simulators (ViZDoom) for a navigation scenario with item gathering tasks. We also report on experts evaluation using DRLViz, and applicability of DRLViz to other scenarios and navigation problems beyond simulation games, as well as its contribution to black box models interpret-ability and explain-ability in the field of visual analytics.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1909.02982

Country: Europe > Austria (0.28)

Genre:

Workflow (0.46)
Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Action-Transferable Policy with Action Embedding

Chen, Yu, Chen, Yingfeng, Yang, Yu, Li, Ying, Yin, Jianwei, Fan, Changjie

arXiv.org Artificial IntelligenceSep-6-2019

Despite achieving great success on performance in various sequential decision task, deep reinforcement learning is extremely data inefficient. Many approaches have been proposed to improve the data efficiency, e.g. Previous researches on transfer learning mostly attempt to learn a common feature space of states across related tasks to exploit knowledge as much as possible. However, semantic information of actions may be shared as well, even between tasks with different action space size. In this work, we first propose a method to learn action embedding for discrete actions in RL from generated trajectories without any prior knowledge, and then leverage it to transfer policy across tasks with different state space and/or discrete action space. Our experimental results show that our method can effectively learn informative action embeddings and accelerate learning by policy transfer across tasks. Introduction Deep reinforcement learning (DRL), which combines reinforcement learning algorithms and deep neural networks, has achieved great success in many domains, such as playing Atari games (Mnih et al. 2015), playing game of Go (Silver et al. 2016) and robotics control (Levine et al. 2016). Although the DRL is viewed as one of the most potential ways to the General Artificial Intelligence, it is still criticized for its data inefficiency. Training an agent from scratch requires considerable numbers of interactions with the environment for a very specific task.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1909.02291

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

#artificialintelligenceSep-5-2019, 11:48:32 GMT

This is the second post of the series, in which we will talk about a novel Hierarchical Reinforcement Learning built upon HIerarchical Reinforcement learning with Off-policy correction(HIRO) we discussed in the previous post. This post is comprised of two sections. In the first section, we first compared architectures of representation learning for HRL and HIRO; then we started from Claim 4 in the paper, seeing how to learn good representations that lead to bounded sub-optimality and how the intrinsic reward for the low-level policy is defined; we will provide the pseudocode for the algorithm at the end of this section. In section Discussion, we will bring some insight into the algorithm and connect the low-level policy to the probabilistic graphical model to build some intuition. Different from HIRO, in which goals serve as a measure of dissimilarity between the current state and the desired state, goals here are used to directly produce a lower-level policy in conjunction with the current state.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

Add feedback

Data-Efficient Hierarchical Reinforcement Learning -- HIRO

#artificialintelligenceSep-5-2019, 11:48:10 GMT

Traditional reinforcement learning algorithms have achieved encouraging success in recent years. Their nature of reasoning on the atomic scale, however, makes them hard to scale to complex tasks. Hierarchical Reinforcement Learning(HRL) introduces high-level abstraction, whereby the agent is able to plan on different scales. In this post, we discuss an HRL algorithm proposed by Ofir Nachum et al. in Google Brain at NIPS 2018. The algorithm, known as HIerarchical Reinforcement learning with Off-policy correction(HIRO), is designed for goal-directed tasks, in which the agent tries to reach some goal state.

hierarchical reinforcement learning, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

This Obscure Area of Game Theory can Help to Scale Reinforcement Learning to Infinite Agents

#artificialintelligenceSep-5-2019, 11:46:41 GMT

Reinforcement learning is one of the most popular areas of research in deep learning nowadays. Part of the popularity of reinforcement learning is due to the fact that is one of the learning methods that resembles human cognition the closets. In reinforcement learning scenarios and agent learns organically by taking actions on an environment and receiving specific rewards. A little less known discipline called multi-agent reinforcement learning(MARL) focuses on reinforcement learning scenarios involving a large number of agents. Typically, MARL scenarios suffer from a scalability challenges in which its complexity increases linearly with the number of agents in the environment.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control

Zhang, Sai Qian, Zhang, Qi, Lin, Jieyu

arXiv.org Machine LearningSep-5-2019

Multi-agent reinforcement learning (MARL) has recently received considerable attention due to its applicability to a wide range of real-world applications. However, achieving efficient communication among agents has always been an overarching problem in MARL. In this work, we propose Variance Based Control (VBC), a simple yet efficient technique to improve communication efficiency in MARL. By limiting the variance of the exchanged messages between agents during the training phase, the noisy component in the messages can be eliminated effectively, while the useful part can be preserved and utilized by the agents for better performance. Our evaluation using a challenging set of StarCraft II benchmarks indicates that our method achieves $2-10\times$ lower in communication overhead than state-of-the-art MARL algorithms, while allowing agents to better collaborate by developing sophisticated strategies.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1909.02682

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback