AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server

Cui, Guoxin, Xu, Jun, Zeng, Wei, Lan, Yanyan, Guo, Jiafeng, Cheng, Xueqi

arXiv.org Machine LearningApr-22-2018

One of the most significant bottleneck in training large scale machine learning models on parameter server (PS) is the communication overhead, because it needs to frequently exchange the model gradients between the workers and servers during the training iterations. Gradient quantization has been proposed as an effective approach to reducing the communication volume. One key issue in gradient quantization is setting the number of bits for quantizing the gradients. Small number of bits can significantly reduce the communication overhead while hurts the gradient accuracies, and vise versa. An ideal quantization method would dynamically balance the communication overhead and model accuracy, through adjusting the number bits according to the knowledge learned from the immediate past training iterations. Existing methods, however, quantize the gradients either with fixed number of bits, or with predefined heuristic rules. In this paper we propose a novel adaptive quantization method within the framework of reinforcement learning. The method, referred to as MQGrad, formalizes the selection of quantization bits as actions in a Markov decision process (MDP) where the MDP states records the information collected from the past optimization iterations (e.g., the sequence of the loss function values). During the training iterations of a machine learning algorithm, MQGrad continuously updates the MDP state according to the changes of the loss function. Based on the information, MDP learns to select the optimal actions (number of bits) to quantize the gradients. Experimental results based on a benchmark dataset showed that MQGrad can accelerate the learning of a large scale deep neural network while keeping its prediction accuracies.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1804.08066

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations

Bertsekas, Dimitri P.

arXiv.org Machine LearningApr-22-2018

In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. The optimal cost function of the aggregate problem, a nonlinear function of the features, serves as an architecture for approximation in value space of the optimal cost function or the cost functions of policies of the original problem. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with reinforcement learning based on deep neural networks, which is used to obtain the needed features. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by deep reinforcement learning, thereby potentially leading to more effective policy improvement.

approximation, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1804.04577

Country:

North America > United States > Massachusetts > Middlesex County (0.28)
North America > United States > California (0.27)

Genre: Research Report (0.81)

Industry: Leisure & Entertainment > Games > Chess (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Hallucinogenic Deep Reinforcement Learning using Python and Keras

#artificialintelligenceApr-21-2018, 05:37:30 GMT

This post is a step by step guide through the paper. We'll cover the technical details and also walk through how you can get a version running on your own machine. Similarly to my post on AlphaZero, I'm not associated with the authors of the paper but just wanted to share my interpretation of their terrific work. We're going to build a reinforcement learning algorithm (an'agent') that gets good at driving a car around a 2D racetrack. At each time-step, the algorithm is fed an observation (a 64 x 64 pixel colour image of the car and immediate surroundings) and needs to return the next set of actions to take -- specifically, the steering direction (-1 to 1), acceleration (0 to 1) and brake (0 to 1). This action is then passed to the environment, which returns the next observation and the cycle starts again.

artificial intelligence, deep learning, machine learning, (12 more...)

#artificialintelligence

Genre: Research Report (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

Paper Repro: Deep Neuroevolution – Towards Data Science

@machinelearnbotApr-20-2018, 07:30:17 GMT

In this post, we reproduce the recent Uber paper "Deep Neuroevolution: Genetic Algorithms are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning", which amazingly showed that simple genetic algorithms sometimes performed better than apparently advanced reinforcement learning algorithms on well studied problems such as Atari games. We will ourselves reach state of the art performance on Frostbite, a game that had stumped reinforcement learning algorithms for years before Uber finally solved it with this paper. We will also learn about the dark art of training neural networks using genetic algorithms. In a way this could be considered part 3 of my deep reinforcement learning, but I think this article can also stand alone. Note that unlike these previous tutorials, this post will be using PyTorch instead of Keras, mainly because this is what I personally have switched to, but also because PyTorch does happen to be more suited for this particular use case.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

@machinelearnbot

Industry:

Information Technology > Services (0.69)
Leisure & Entertainment > Games > Computer Games (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Study on Overfitting in Deep Reinforcement Learning

Zhang, Chiyuan, Vinyals, Oriol, Munos, Remi, Bengio, Samy

arXiv.org Machine LearningApr-20-2018

Deep neural networks have proved to be effective function approximators in Reinforcement Learning (RL). Significant progress is seen in many RL problems ranging from board games like Go (Silver et al., 2016, 2017b), Chess and Shogi (Silver et al., 2017a), video games like Atari (Mnih et al., 2015) and StarCraft (Vinyals et al., 2017), to real world robotics and control tasks (Lillicrap et al., 2016). Most of these successes are due to improved training algorithms, carefully designed neural network architectures and powerful hardware. For example, in AlphaZero (Silver et al., 2017a), 5,000 1st-generation TPUs and 64 2nd-generation TPUs are used during self-play based training of agents with deep residual networks (He et al., 2016). On the other hand, learning with high-capacity models and long stretched training time on powerful devices could lead to potential risk of overfitting (Hardt et al., 2016; Lin et al., 2016). As a fundamental tradeoff in machine learning, preventing overfitting by properly controlling or regularizing the training is key to out-of-sample generalization. Studies of overfitting could be performed from the theory side, where generalization guarantees are derived for specific learning algorithms; or from the practice side, where carefully designed experimental protocols like cross validation are used as proxy to certify the generalization performance. Unfortunately, in the regime of deep RL, systematic studies of generalization behaviors from either theoretical or empirical perspectives are falling behind the rapid progresses from the algorithm development and application side. The current situation not only makes it difficult to understand the test behaviors like the vulnerabilities to potential adversarial attacks (Huang et al., 2017), but also renders some results difficult to reproduce or compare (Henderson et al., 2017; Machado et al., 2017).

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

1804.06893

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Subgoal Discovery for Hierarchical Dialogue Policy Learning

Tang, Da, Li, Xiujun, Gao, Jianfeng, Wang, Chong, Li, Lihong, Jebara, Tony

arXiv.org Artificial IntelligenceApr-20-2018

Developing conversational agents to engage in complex dialogues is challenging partly because the dialogue policy needs to explore a large state-action space. In this paper, we propose a divide-and-conquer approach that discovers and exploits the hidden structure of the task to enable efficient policy learning. First, given a set of successful dialogue sessions, we present a Subgoal Discovery Network (SDN) to divide a complex goal-oriented task into a set of simpler subgoals in an unsupervised fashion. We then use these subgoals to learn a hierarchical policy which consists of 1) a top-level policy that selects among subgoals, and 2) a low-level policy that selects primitive actions to accomplish the subgoal. We exemplify our method by building a dialogue agent for the composite task of travel planning. Experiments with simulated and real users show that an agent trained with automatically discovered subgoals performs competitively against an agent with human-defined subgoals, and significantly outperforms an agent without subgoals. Moreover, we show that learned subgoals are human comprehensible.

machine learning, natural language, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1804.07855

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry: Consumer Products & Services > Travel (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.66)

Add feedback

PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making

Yang, Fangkai, Lyu, Daoming, Liu, Bo, Gustafson, Steven

arXiv.org Artificial IntelligenceApr-20-2018

Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents. Reinforcement learning relies on learning from interactions with real world, which often requires an unfeasibly large amount of experience. Symbolic planning relies on manually crafted symbolic knowledge, which may not be robust to domain uncertainties and changes. In this paper we present a unified framework {\em PEORL} that integrates symbolic planning with hierarchical reinforcement learning (HRL) to cope with decision-making in a dynamic environment with uncertainties. Symbolic plans are used to guide the agent's task execution and learning, and the learned experience is fed back to symbolic knowledge to improve planning. This method leads to rapid policy search and robust symbolic plans in complex domains. The framework is tested on benchmark domains of HRL.

machine learning, reinforcement, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

1804.07779

Country: North America > United States (0.68)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Cross-domain Dialogue Policy Transfer via Simultaneous Speech-act and Slot Alignment

Mo, Kaixiang, Zhang, Yu, Yang, Qiang, Fung, Pascale

arXiv.org Artificial IntelligenceApr-20-2018

Dialogue policy transfer enables us to build dialogue policies in a target domain with little data by leveraging knowledge from a source domain with plenty of data. Dialogue sentences are usually represented by speech-acts and domain slots, and the dialogue policy transfer is usually achieved by assigning a slot mapping matrix based on human heuristics. However, existing dialogue policy transfer methods cannot transfer across dialogue domains with different speech-acts, for example, between systems built by different companies. Also, they depend on either common slots or slot entropy, which are not available when the source and target slots are totally disjoint and no database is available to calculate the slot entropy. To solve this problem, we propose a Policy tRansfer across dOMaIns and SpEech-acts (PROMISE) model, which is able to transfer dialogue policies across domains with different speech-acts and disjoint slots. The PROMISE model can learn to align different speech-acts and slots simultaneously, and it does not require common slots or the calculation of the slot entropy. Experiments on both real-world dialogue data and simulations demonstrate that PROMISE model can effectively transfer dialogue policies across domains with different speech-acts and disjoint slots.

dialogue policy, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

1804.07691

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)

Add feedback

Outline Objects using Deep Reinforcement Learning

Wang, Zhenxin, Sarcar, Sayan, Liu, Jingxin, Zheng, Yilin, Ren, Xiangshi

arXiv.org Artificial IntelligenceApr-20-2018

Image segmentation needs both local boundary position information and global object context information. The performance of the recent state-of-the-art method, fully convolutional networks, reaches a bottleneck due to the neural network limit after balancing between the two types of information simultaneously in an end-to-end training style. To overcome this problem, we divide the semantic image segmentation into temporal subtasks. First, we find a possible pixel position of some object boundary; then trace the boundary at steps within a limited length until the whole object is outlined. We present the first deep reinforcement learning approach to semantic image segmentation, called DeepOutline, which outperforms other algorithms in Coco detection leaderboard in the middle and large size person category in Coco val2017 dataset. Meanwhile, it provides an insight into a divide and conquer way by reinforcement learning on computer vision problems.

machine learning, reinforcement learning, segmentation, (16 more...)

arXiv.org Artificial Intelligence

1804.04603

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Optimising Traffic Using Reinforcement Learning – Becoming Human: Artificial Intelligence Magazine

#artificialintelligenceApr-19-2018, 02:21:12 GMT

Fundamentally, the root of the urban traffic distribution problem is in multi-criteria decision making. The Reinforcement Learning framework, in which an agent learns from a model with optimal policy based on its environment, could provide an advantageous method for algorithmic development and network improvement. Each action that the agent would take will lead to a reward or punishment with the new observation of the state. Through its learning progress, the agent will learn a distributed routing policy that could maximise the capacity of an urban transport network. This process could be treated as a Markov Decision Process (MDP), which ultimately aims for the best solution by optimising specific policy step by step.

artificial intelligence magazine, machine learning, reinforcement learning, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback