AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

On Policy Gradients

Kämmerer, Mattis Manfred

arXiv.org Machine LearningNov-12-2019

The goal of policy gradient approaches is to find a policy in a given class of policies which maximizes the expected return. Given a differentiable model of the policy, we want to apply a gradient-ascent technique to reach a local optimum. We mainly use gradient ascent, because it is theoretically well researched. The main issue is that the policy gradient with respect to the expected return is not available, thus we need to estimate it. As policy gradient algorithms also tend to require on-policy data for the gradient estimate, their biggest weakness is sample efficiency. For this reason, most research is focused on finding algorithms with improved sample efficiency. This paper provides a formal introduction to policy gradient that shows the development of policy gradient approaches, and should enable the reader to follow current research on the topic.

algorithm, gradient, policy gradient, (16 more...)

arXiv.org Machine Learning

1911.04817

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > United States > Ohio (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

Schedule Earth Observation satellites with Deep Reinforcement Learning

Hadj-Salah, Adrien, Verdier, Rémi, Caron, Clément, Picard, Mathieu, Capelle, Mikaël

arXiv.org Artificial IntelligenceNov-12-2019

Requests come in a variety of size and constraints, from the urgent monitoring of small areas to large area coverage. In this work we are particularly interested in the latter case, with requests covering whole countries or even continents. Depending on weather conditions, such requests may take several months to complete, even with multiple satellites. In order to shorten the time required to fulfill requests, the mission orchestrator shall schedule acquisitions with both a short and a long-term strategy. Determining a strategy robust to an uncertain environment is a complex task, this is why current solutions mainly consist of heuristics configured by human-experts. This paper demonstrates that Reinforcement Learning (RL) might be well-suited for such a challenge. RL has proven to be of great value since these algorithms have mastered several games such as Pong on Atari 2600 (Mnih et al. 2013), Go with AlphaGo (Silver et al. 2017) and more recently Starcraft (Arulkumaran, Cully, and Togelius 2019).c

mesh, reinforcement learning, satellite, (11 more...)

arXiv.org Artificial Intelligence

1911.05696

Country: Europe > France (0.05)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Go (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

MSDF: A Deep Reinforcement Learning Framework for Service Function Chain Migration

Chen, Ruoyun, Lu, Hancheng, Lu, Yujiao, Liu, Jinxue

arXiv.org Artificial IntelligenceNov-12-2019

Under dynamic traffic, service function chain (SFC) migration is considered as an effective way to improve resource utilization. However, the lack of future network information leads to non-optimal solutions, which motivates us to study reinforcement learning based SFC migration from a long-term perspective. In this paper, we formulate the SFC migration problem as a minimization problem with the objective of total network operation cost under constraints of users' quality of service. We firstly design a deep Q-network based algorithm to solve single SFC migration problem, which can adjust migration strategy online without knowing future information. Further, a novel multi-agent cooperative framework, called MSDF, is proposed to address the challenge of considering multiple SFC migration on the basis of single SFC migration. MSDF reduces the complexity thus accelerates the convergence speed, especially in large scale networks. Experimental results demonstrate that MSDF outperforms typical heuristic algorithms under various scenarios.

migration, node, subagent, (13 more...)

arXiv.org Artificial Intelligence

1911.04801

Country:

North America > United States (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

DRiLLS: Deep Reinforcement Learning for Logic Synthesis

Hosny, Abdelrahman, Hashemi, Soheil, Shalan, Mohamed, Reda, Sherief

arXiv.org Artificial IntelligenceNov-12-2019

Abstract-- Logic synthesis requires extensive tuning of the synthesis optimization flow where the quality of results (QoR) depends on the sequence of optimizations used. Efficient design space exploration is challenging due to the exponential number of possible optimization permutations. Therefore, automating the optimization process is necessary. In this work, we propose a novel reinforcement learning-based methodology that navigates the optimization space without human intervention. We demonstrate the training of an Advantage Actor Critic (A2C) agent that seeks to minimize area subject to a timing constraint. Using the proposed methodology, designs can be optimized autonomously with no-humans in-loop. Evaluation on the comprehensive EPFL benchmark suite shows that the agent outperforms existing exploration methodologies and improves QoRs by an average of 13%. Logic synthesis transforms a high-level description of a design into an optimized gate-level representation. Modern logic synthesis tools represent a given design as an And-Inverter Graph (AIG), which encodes representative characteristics for optimizing Boolean functions.

agent, delay constraint, optimization, (16 more...)

arXiv.org Artificial Intelligence

1911.04021

Country:

North America > United States > Rhode Island > Providence County > Providence (0.05)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Reinforcement-Learning-Based Distributed Resource Selection Algorithm for Massive IoT

#artificialintelligenceNov-11-2019, 22:34:51 GMT

Massive IoT including the large number of resource-constrained IoT devices has gained great attention. IoT devices generate enormous traffic, which causes network congestion. To manage network congestion, multi-channel-based algorithms are proposed. However, most of the existing multi-channel algorithms require strict synchronization, an extra overhead for negotiating channel assignment, which poses significant challenges to resource-constrained IoT devices. In this paper, a distributed channel selection algorithm utilizing the tug-of-war (TOW) dynamics is proposed for improving successful frame delivery of the whole network by letting IoT devices always select suitable channels for communication adaptively. The proposed TOW dynamics-based channel selection algorithm has a simple reinforcement learning procedure that only needs to receive the acknowledgment (ACK) frame for the learning procedure, while simply requiring minimal memory and computation capability.

algorithm, resource selection algorithm, resource-constrained iot device, (7 more...)

#artificialintelligence

Industry: Information Technology (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Asynchronous Methods for Deep Reinforcement Learning

#artificialintelligenceNov-11-2019, 17:22:54 GMT

We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers.

asynchronous method, deep reinforcement learning, neural network controller, (1 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

14 Different Types of Learning in Machine Learning

#artificialintelligenceNov-11-2019, 15:39:20 GMT

The use of an environment means that there is no fixed training dataset, rather a goal or set of goals that an agent is required to achieve, actions they may perform, and feedback about performance toward the goal. Some machine learning algorithms do not just experience a fixed dataset. For example, reinforcement learning algorithms interact with an environment, so there is a feedback loop between the learning system and its experiences.

algorithm, deep learning, learning, (15 more...)

#artificialintelligence

Industry: Education (0.33)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Driving Reinforcement Learning with Models

Ferraro, Pietro, Rathi, Meghana, Russo, Giovanni

arXiv.org Artificial IntelligenceNov-11-2019

Over the years, Reinforcement Learning (RL) established itself as a convenient paradigm to learn optimal policies from data. However, most RL algorithms achieve optimal policies by exploring all the possible actions and this, in real-world scenarios, is often infeasible or impractical due to e.g. safety constraints. Motivated by this, in this paper we propose to augment RL with Model Predictive Control (MPC), a popular model-based control algorithm that allows to optimally control a system while satisfying a set of constraints. The result is an algorithm, the MPC-augmented RL algorithm (MPCaRL) that makes use of MPC to both drive how RL explores the actions and to modify the corresponding rewards. We demonstrate the effectiveness of the MPCaRL by letting it play against the Atari game Pong. The results obtained highlight the ability of the algorithm to learn general tasks with essentially no training.

artificial intelligence, computer game, functionality, (18 more...)

arXiv.org Artificial Intelligence

1911.044

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.50)

Industry:

Energy > Oil & Gas (1.00)
Leisure & Entertainment > Games > Computer Games (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning Representations in Reinforcement Learning:An Information Bottleneck Approach

Yingjun, Pei, Xinwen, Hou

arXiv.org Artificial IntelligenceNov-11-2019

The information bottleneck principle is an elegant and useful approach to representation learning. In this paper, we investigate the problem of representation learning in the context of reinforcement learning using the information bottleneck framework, aiming at improving the sample efficiency of the learning algorithms. %by accelerating the process of discarding irrelevant information when the %input states are extremely high-dimensional. We analytically derive the optimal conditional distribution of the representation, and provide a variational lower bound. Then, we maximize this lower bound with the Stein variational (SV) gradient method. We incorporate this framework in the advantageous actor critic algorithm (A2C) and the proximal policy optimization algorithm (PPO). Our experimental results show that our framework can improve the sample efficiency of vanilla A2C and PPO significantly. Finally, we study the information bottleneck (IB) perspective in deep RL with the algorithm called mutual information neural estimation(MINE) . We experimentally verify that the information extraction-compression process also exists in deep RL and our framework is capable of accelerating this process. We also analyze the relationship between MINE and our method, through this relationship, we theoretically derive an algorithm to optimize our IB framework without constructing the lower bound.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1911.05695

Country:

North America > United States > Illinois (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement-Learning-Based Variational Quantum Circuits Optimization for Combinatorial Problems

Khairy, Sami, Shaydulin, Ruslan, Cincio, Lukasz, Alexeev, Yuri, Balaprakash, Prasanna

arXiv.org Machine LearningNov-11-2019

Quantum computing exploits basic quantum phenomena such as state superposition and entanglement to perform computations. The Quantum Approximate Optimization Algorithm (QAOA) is arguably one of the leading quantum algorithms that can outperform classical state-of-the-art methods in the near term. QAOA is a hybrid quantum-classical algorithm that combines a parameterized quantum state evolution with a classical optimization routine to approximately solve combinatorial problems. The quality of the solution obtained by QAOA within a fixed budget of calls to the quantum computer depends on the performance of the classical optimization routine used to optimize the variational parameters. In this work, we propose an approach based on reinforcement learning (RL) to train a policy network that can be used to quickly find high-quality variational parameters for unseen combinatorial problem instances. The RL agent is trained on small problem instances which can be simulated on a classical computer, yet the learned RL policy is generalizable and can be used to efficiently solve larger instances. Extensive simulations using the IBM Qiskit Aer quantum circuit simulator demonstrate that our trained RL policy can reduce the optimality gap by a factor up to 8.61 compared with other off-the-shelf optimizers tested.

algorithm, objective, optimizer, (13 more...)

arXiv.org Machine Learning

1911.04574

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > Illinois (0.04)
North America > Canada (0.04)

Genre: Research Report (1.00)

Industry:

Energy (0.71)
Government > Regional Government (0.48)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback