AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Multi-View Reinforcement Learning

Li, Minne, Wu, Lisheng, Ammar, Haitham Bou, Wang, Jun

arXiv.org Machine LearningOct-18-2019

This paper is concerned with multi-view reinforcement learning (MVRL), which allows for decision making when agents share common dynamics but adhere to different observation models. We define the MVRL framework by extending partially observable Markov decision processes (POMDPs) to support more than one observation model and propose two solution methods through observation augmentation and cross-view policy transfer. We empirically evaluate our method and demonstrate its effectiveness in a variety of environments. Specifically, we show reductions in sample complexities and computational time for acquiring policies that handle multi-view environments.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1910.08285

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
North America > Canada (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Kumar, Harshat, Koppel, Alec, Ribeiro, Alejandro

arXiv.org Machine LearningOct-18-2019

Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps to estimate the value function and policy gradient updates. Due to the fact that the updates exhibit correlated noise and biased gradient updates, only the asymptotic behavior of actor-critic is known by connecting its behavior to dynamical systems. This work puts forth a new variant of actor-critic that employs Monte Carlo rollouts during the policy search updates, which results in controllable bias that depends on the number of critic evaluations. As a result, we are able to provide for the first time the convergence rate of actor-critic algorithms when the policy search step employs policy gradient, agnostic to the choice of policy evaluation technique. In particular, we establish conditions under which the sample complexity is comparable to stochastic gradient method for non-convex problems or slower as a result of the critic estimation error, which is the main complexity bottleneck. These results hold for in continuous state and action spaces with linear function approximation for the value function. We then specialize these conceptual results to the case where the critic is estimated by Temporal Difference, Gradient Temporal Difference, and Accelerated Gradient Temporal Difference. These learning rates are then corroborated on a navigation problem involving an obstacle, which suggests that learning more slowly may lead to improved limit points, providing insight into the interplay between optimization and generalization in reinforcement learning.

approximation, assumption, convergence, (12 more...)

arXiv.org Machine Learning

1910.08412

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.63)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.61)

Add feedback

Graph Convolutional Policy for Solving Tree Decomposition via Reinforcement Learning Heuristics

Khakhulin, Taras, Schutski, Roman, Oseledets, Ivan

arXiv.org Machine LearningOct-18-2019

We propose a Reinforcement Learning based approach to approximately solve the Tree Decomposition (TD)problem. TD is a combinatorial problem, which is central to the analysis of graph minor structure and computational complexity, as well as in the algorithms of probabilistic inference, register allocation, and other practical tasks. Recently, it has been shown that combinatorial problems can be successively solved by learned heuristics. However, the majority of existing works do not address the question of the generalization of learning-based solutions. Our model is based on the graph convolution neural network (GCN) for learning graph representations. We show that the agent builton GCN and trained on a single graph using an Actor-Critic method can efficiently generalize to real-world TD problem instances. We establish that our method successfully generalizes from small graphs, where TD can be found by exact algorithms, to large instances of practical interest, while still having very low time-to-solution. On the other hand, the agent-based approach surpasses all greedy heuristics by the quality of the solution.

agent, algorithm, graph, (13 more...)

arXiv.org Machine Learning

1910.08371

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Europe > Hungary (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

OffWorld Gym: open-access physical robotics environment for real-world reinforcement learning benchmark and research

Kumar, Ashish, Buckley, Toby, Wang, Qiaozhi, Kavelaars, Alicia, Kuzovkin, Ilya

arXiv.org Artificial IntelligenceOct-18-2019

Success stories of applied machine learning can be traced back to the datasets and environments that were put forward as challenges for the community. The challenge that the community sets as a benchmark is usually the challenge that the community eventually solves. The ultimate challenge of reinforcement learning research is to train real agents to operate in the real environment, but until now there has not been a common real-world RL benchmark. In this work, we present a prototype real-world environment from OffWorld Gym -- a collection of real-world environments for reinforcement learning in robotics with free public remote access. Close integration into existing ecosystem allows the community to start using OffWorld Gym without any prior experience in robotics and takes away the burden of managing a physical robotics system, abstracting it under a familiar API. We introduce a navigation task, where a robot has to reach a visual beacon on an uneven terrain using only the camera input and provide baseline results in both the real environment and the simulated replica. To start training, visit https://gym.offworld.ai.

arxiv preprint arxiv, reinforcement, robot, (14 more...)

arXiv.org Artificial Intelligence

1910.08639

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > Sweden > Skåne County > Malmö (0.04)
Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report (0.40)

Industry:

Information Technology (1.00)
Leisure & Entertainment > Games > Computer Games (0.95)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Unsupervised Context Rewriting for Open Domain Conversation

Zhou, Kun, Zhang, Kai, Wu, Yu, Liu, Shujie, Yu, Jingsong

arXiv.org Artificial IntelligenceOct-18-2019

Context modeling has a pivotal role in open domain conversation. Existing works either use heuristic methods or jointly learn context modeling and response generation with an encoder-decoder framework. This paper proposes an explicit context rewriting method, which rewrites the last utterance by considering context history. We leverage pseudo-parallel data and elaborate a context rewriting network, which is built upon the CopyNet with the reinforcement learning method. The rewritten utterance is beneficial to candidate retrieval, explainable context modeling, as well as enabling to employ a single-turn framework to the multi-turn scenario. The empirical results show that our model outperforms baselines in terms of the rewriting quality, the multi-turn response generation, and the end-to-end retrieval-based chatbots.

information, response generation, utterance, (14 more...)

arXiv.org Artificial Intelligence

1910.08282

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Fujian Province > Xiamen (0.04)
Europe > Germany > Berlin (0.04)
(14 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Soft Actor-Critic for Discrete Action Settings

Christodoulou, Petros

arXiv.org Artificial IntelligenceOct-18-2019

Soft Actor-Critic is a state-of-the-art reinforcement learning algorithm for continuous action settings that is not applicable to discrete action settings. Many important settings involve discrete actions, however, and so here we derive an alternative version of the Soft Actor-Critic algorithm that is applicable to discrete action settings. We then show that, even without any hyperparameter tuning, it is competitive with the tuned model-free state-of-the-art on a selection of games from the Atari suite.

algorithm, hyperparameter, soft actor-critic, (13 more...)

arXiv.org Artificial Intelligence

1910.07207

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Decision Automation for Electric Power Network Recovery

Sarkale, Yugandhar, Nozhati, Saeed, Chong, Edwin K. P., Ellingwood, Bruce R.

arXiv.org Artificial IntelligenceOct-18-2019

Critical infrastructure systems such as electric power networks, water networks, and transportation systems play a major role in the welfare of any community. In the aftermath of disasters, their recovery is of paramount importance; orderly and efficient recovery involves the assignment of limited resources (a combination of human repair workers and machines) to repair damaged infrastructure components. The decision maker must also deal with uncertainty in the outcome of the resource-allocation actions during recovery. The manual assignment of resources seldom is optimal despite the expertise of the decision maker because of the large number of choices and uncertainties in consequences of sequential decisions. This combinatorial assignment problem under uncertainty is known to be \mbox{NP-hard}. We propose a novel decision technique that addresses the massive number of decision choices for large-scale real-world problems; in addition, our method also features an experiential learning component that adaptively determines the utilization of the computational resources based on the performance of a small number of choices. Our framework is closed-loop, and naturally incorporates all the attractive features of such a decision-making system. In contrast to myopic approaches, which do not account for the future effects of the current choices, our methodology has an anticipatory learning component that effectively incorporates \emph{lookahead} into the solutions. To this end, we leverage the theory of regression analysis, Markov decision processes (MDPs), multi-armed bandits, and stochastic models of community damage from natural disasters to develop a method for near-optimal recovery of communities. Our method contributes to the general problem of MDPs with massive action spaces with application to recovery of communities affected by hazards.

algorithm, algorithm 3, recovery, (15 more...)

arXiv.org Artificial Intelligence

1910.00699

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Colorado > Larimer County > Fort Collins (0.04)
(6 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Energy > Power Industry (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)
(2 more...)

Add feedback

Reinforcement Learning -- Generalisation of Off-Policy Learning

#artificialintelligenceOct-17-2019, 23:12:55 GMT

Till now, we have extended our reinforcement learning topic from discrete state to continuous state and have elaborated a bit on applying tile coding to on-policy learning, that is the learning process follows the trajectory the agent takes. Now let's have a talk of off-policy learning in continuous settings. While in discrete settings, on-policy learning can easily be generalised to off-policy learning(say, from Sarsa to Q-learning), in continuous settings, the generalisation can be a little troublesome, and in some scenarios can cause divergence issues. The most prominent consequence of off-policy learning is it may not necessarily converge in continuous settings. The major reason is caused by the distribution of updates in the off-policy case is not according to the on-policy distribution, that is the state, action being used to update might not be the state, action the agent takes.

behaviour policy, learning, off-policy learning, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Three Things to Know About Reinforcement Learning - KDnuggets

#artificialintelligenceOct-17-2019, 14:37:08 GMT

If you are following technology news, you have likely already read about how AI programs trained with reinforcement learning beat human players in board games like Go and chess, as well as video games. As an engineer, scientist, or researcher, you may want to take advantage of this new and growing technology, but where do you start? The best place to begin is to understand what the concept is, how to implement it, and whether it's the right approach for a given problem. If we simplify the concept, at its foundation, reinforcement learning is a type of machine learning that has the potential to solve toughdecision-making problems. Reinforcement learning is a type of machine learning in whicha computer learns to perform a task through repeated trial-and-error interactions with a dynamic environment.

neural network, reinforcement, training algorithm, (12 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Facebook's ReAgent is a toolkit for reinforcement learning and reasoning AI

#artificialintelligenceOct-17-2019, 06:02:02 GMT

Facebook AI Research today introduced ReAgent, a reinforcement learning toolkit for building decision-making AI that can receive feedback. ReAgent can assign scores to user actions and treat user input such as clicking on recommended content as training data. ReAgent is a small C library available for download on GitHub designed to be embedded in any application. The toolkit comes with a set of decision-making AI models to get started, an offline module for model performance assessment, and a platform to deploy AI into production using the TorchScript library in PyTorch. Horizon, a reinforcement learning platform for deployment of large-scale models in production open-sourced by Facebook in November 2018, is now part of ReAgent.

facebook, reagent, reinforcement learning and reasoning ai, (7 more...)

#artificialintelligence

Industry: Information Technology > Services (0.76)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.43)

Add feedback