AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Domain Adversarial Reinforcement Learning for Partial Domain Adaptation

Chen, Jin, Wu, Xinxiao, Duan, Lixin, Gao, Shenghua

arXiv.org Machine LearningMay-10-2019

Partial domain adaptation aims to transfer knowledge from a label-rich source domain to a label-scarce target domain which relaxes the fully shared label space assumption across different domains. In this more general and practical scenario, a major challenge is how to select source instances in the shared classes across different domains for positive transfer. To address this issue, we propose a Domain Adversarial Reinforcement Learning (DARL) framework to automatically select source instances in the shared classes for circumventing negative transfer as well as to simultaneously learn transferable features between domains by reducing the domain shift. Specifically, in this framework, we employ deep Q-learning to learn policies for an agent to make selection decisions by approximating the action-value function. Moreover, domain adversarial learning is introduced to learn domain-invariant features for the selected source instances by the agent and the target instances, and also to determine rewards for the agent based on how relevant the selected source instances are to the target domain. Experiments on several benchmark datasets demonstrate that the superior performance of our DARL method over existing state of the arts for partial domain adaptation.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Machine Learning

1905.04094

Country: Asia (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

GAN-based Deep Distributional Reinforcement Learning for Resource Management in Network Slicing

Hua, Yuxiu, Li, Rongpeng, Zhao, Zhifeng, Zhang, Honggang, Chen, Xianfu

arXiv.org Machine LearningMay-10-2019

Network slicing is a key technology in 5G communications system, which aims to dynamically and efficiently allocate resources for diversified services with distinct requirements over a common underlying physical infrastructure. Therein, demand-aware allocation is of significant importance to network slicing. In this paper, we consider a scenario that contains several slices in one base station on sharing the same bandwidth. Deep reinforcement learning (DRL) is leveraged to solve this problem by regarding the varying demands and the allocated bandwidth as the environment \emph{state} and \emph{action}, respectively. In order to obtain better quality of experience (QoE) satisfaction ratio and spectrum efficiency (SE), we propose generative adversarial network (GAN) based deep distributional Q network (GAN-DDQN) to learn the distribution of state-action values. Furthermore, we estimate the distributions by approximating a full quantile function, which can make the training error more controllable. In order to protect the stability of GAN-DDQN's training process from the widely-spanning utility values, we also put forward a reward-clipping mechanism. Finally, we verify the performance of the proposed GAN-DDQN algorithm through extensive simulations.

gan-ddqn, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1905.03929

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.64)

Industry: Telecommunications (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

Simchowitz, Max, Jamieson, Kevin

arXiv.org Machine LearningMay-9-2019

Reinforcement learning (RL) is a powerful paradigm for modeling a learning agent's interactions with an unknown environment, in an attempt to accumulate as much reward as possible. Because of its flexibility, RL can encode such a vast array of different problem settings - many of which are entirely intractable. Therefore, it is crucial to understand what conditions make it possible for an RL agent to effectively learn about its environment. In this paper, we consider tabular Markov decision processes (MDPs), a canonical RL setting where the agent seeks to learn a policy mapping discrete states x S to one of finitely many actions a A, in attempt to maximize cumulative reward over an episode horizon H. We shall study the regret setting, where the learner plays a policy π for a sequence of episodes k 1, . . .

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

1905.03814

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

A Reinforcement Learning Perspective on the Optimal Control of Mutation Probabilities for the (1+1) Evolutionary Algorithm: First Results on the OneMax Problem

Mossina, Luca, Rachelson, Emmanuel, Delahaye, Daniel

arXiv.org Artificial IntelligenceMay-9-2019

We study how Reinforcement Learning can be employed to optimally control parameters in evolutionary algorithms. We control the mutation probability of a (1+1) evolutionary algorithm on the OneMax function. This problem is modeled as a Markov Decision Process and solved with Value Iteration via the known transition probabilities. It is then solved via Q-Learning, a Reinforcement Learning algorithm, where the exact transition probabilities are not needed. This approach also allows previous expert or empirical knowledge to be included into learning. It opens new perspectives, both formally and computationally, for the problem of parameter control in optimization.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1905.03726

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.37)

Add feedback

Toward Packet Routing with Fully-distributed Multi-agent Deep Reinforcement Learning

You, Xinyu, Li, Xuanjie, Xu, Yuedong, Feng, Hui, Zhao, Jin

arXiv.org Artificial IntelligenceMay-9-2019

Packet routing is one of the fundamental problems in computer networks in which a router determines the next-hop of each packet in the queue to get it as quickly as possible to its destination. Reinforcement learning has been introduced to design the autonomous packet routing policy namely Q-routing only using local information available to each router. However, the curse of dimensionality of Q-routing prohibits the more comprehensive representation of dynamic network states, thus limiting the potential benefit of reinforcement learning. Inspired by recent success of deep reinforcement learning (DRL), we embed deep neural networks in multi-agent Q-routing. Each router possesses an independent neural network that is trained without communicating with its neighbors and makes decision locally. Two multi-agent DRL-enabled routing algorithms are proposed: one simply replaces Q-table of vanilla Q-routing by a deep neural network, and the other further employs extra information including the past actions and the destinations of non-head of line packets. Our simulation manifests that the direct substitution of Q-table by a deep neural network may not yield minimal delivery delays because the neural network does not learn more from the same input. When more information is utilized, adaptive routing policy can converge and significantly reduce the packet delivery time.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1905.03494

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Industry:

Telecommunications > Networks (0.89)
Transportation (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Pretrain Soft Q-Learning with Imperfect Demonstrations

Zhang, Xiaoqin, Li, Yunfei, Ma, Huimin, Luo, Xiong

arXiv.org Machine LearningMay-9-2019

Pretraining reinforcement learning methods with demonstrations has been an important concept in the study of reinforcement learning since a large amount of computing power is spent on online simulations with existing reinforcement learning algorithms. Pretraining reinforcement learning remains a significant challenge in exploiting expert demonstrations whilst keeping exploration potentials, especially for value based methods. In this paper, we propose a pretraining method for soft Q-learning. Our work is inspired by pretraining methods for actor-critic algorithms since soft Q-learning is a value based algorithm that is equivalent to policy gradient. The proposed method is based on $\gamma$-discounted biased policy evaluation with entropy regularization, which is also the updating target of soft Q-learning. Our method is evaluated on various tasks from Atari 2600. Experiments show that our method effectively learns from imperfect demonstrations, and outperforms other state-of-the-art methods that learn from expert demonstrations.

demonstration, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1905.03501

Country: Asia > China (0.15)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.69)

Industry: Leisure & Entertainment > Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning to Evolve

Schuchardt, Jan, Golkov, Vladimir, Cremers, Daniel

arXiv.org Machine LearningMay-8-2019

Evolution and learning are two of the fundamental mechanisms by which life adapts in order to survive and to transcend limitations. These biological phenomena inspired successful computational methods such as evolutionary algorithms and deep learning. Evolution relies on random mutations and on random genetic recombination. Here we show that learning to evolve, i.e. learning to mutate and recombine better than at random, improves the result of evolution in terms of fitness increase per generation and even in terms of attainable fitness. We use deep reinforcement learning to learn to dynamically adjust the strategy of evolutionary algorithms to varying circumstances. Our methods outperform classical evolutionary algorithms on combinatorial and continuous optimization problems.

evolutionary algorithm, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1905.03389

Country: North America > United States (0.67)

Genre: Research Report (0.40)

Industry:

Energy (0.67)
Education (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Smoothing Policies and Safe Policy Gradients

Papini, Matteo, Pirotta, Matteo, Restelli, Marcello

arXiv.org Machine LearningMay-8-2019

Policy gradient algorithms are among the best candidates for the much anticipated application of reinforcement learning to real-world control tasks, such as the ones arising in robotics. However, the trial-and-error nature of these methods introduces safety issues whenever the learning phase itself must be performed on a physical system. In this paper, we address a specific safety formulation, where danger is encoded in the reward signal and the learning agent is constrained to never worsen its performance. By studying actor-only policy gradient from a stochastic optimization perspective, we establish improvement guarantees for a wide class of parametric policies, generalizing existing results on Gaussian policies. This, together with novel upper bounds on the variance of policy gradient estimators, allows to identify those meta-parameter schedules that guarantee monotonic improvement with high probability. The two key meta-parameters are the step size of the parameter updates and the batch size of the gradient estimators. By a joint, adaptive selection of these meta-parameters, we obtain a safe policy gradient algorithm.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1905.03231

Country: Europe (0.28)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Meta-learning of Sequential Strategies

Ortega, Pedro A., Wang, Jane X., Rowland, Mark, Genewein, Tim, Kurth-Nelson, Zeb, Pascanu, Razvan, Heess, Nicolas, Veness, Joel, Pritzel, Alex, Sprechmann, Pablo, Jayakumar, Siddhant M., McGrath, Tom, Miller, Kevin, Azar, Mohammad, Osband, Ian, Rabinowitz, Neil, György, András, Chiappa, Silvia, Osindero, Simon, Teh, Yee Whye, van Hasselt, Hado, de Freitas, Nando, Botvinick, Matthew, Legg, Shane

arXiv.org Machine LearningMay-8-2019

In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal predictors and reinforcement learners which behave as if they had a probabilistic model that allowed them to efficiently exploit task structure. Furthermore, we recast memory-based meta-learning within a Bayesian framework, showing that the meta-learned strategies are near-optimal because they amortize Bayes-filtered data, where the adaptation is implemented in the memory dynamics as a state-machine of sufficient statistics. Essentially, memory-based meta-learning translates the hard problem of probabilistic sequential inference into a regression problem.

machine learning, prediction, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1905.0303

Country:

Europe (0.28)
North America > United States (0.28)

Genre:

Research Report (0.40)
Overview (0.34)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

Add feedback

Object Exchangeability in Reinforcement Learning: Extended Abstract

Mern, John, Sadigh, Dorsa, Kochenderfer, Mykel

arXiv.org Machine LearningMay-7-2019

Although deep reinforcement learning has advanced significantly over the past several years, sample efficiency remains a major challenge. Careful choice of input representations can help improve efficiency depending on the structure present in the problem. In this work, we present an attention-based method to project inputs into an efficient representation space that is invariant under changes to input ordering. We show that our proposed representation results in a search space that is a factor of m! smaller for inputs of m objects. Our experiments demonstrate improvements in sample efficiency for policy gradient methods on a variety of tasks. We show that our representation allows us to solve problems that are otherwise intractable when using naive approaches.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1905.02698

Country:

North America > United States > California > Santa Clara County > Stanford (0.16)
North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > Canada > Quebec > Montreal (0.05)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback