AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

Jaques, Natasha, Gu, Shixiang, Bahdanau, Dzmitry, Hernández-Lobato, José Miguel, Turner, Richard E., Eck, Douglas

arXiv.org Artificial IntelligenceOct-16-2017

This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity. An RNN is first pre-trained on data using maximum likelihood estimation (MLE), and the probability distribution over the next token in the sequence learned by this model is treated as a prior policy. Another RNN is then trained using reinforcement learning (RL) to generate higher-quality outputs that account for domain-specific incentives while retaining proximity to the prior policy of the MLE RNN. To formalize this objective, we derive novel off-policy RL methods for RNNs from KL-control. The effectiveness of the approach is demonstrated on two applications; 1) generating novel musical melodies, and 2) computational molecular generation. For both problems, we show that the proposed method improves the desired properties and structure of the generated sequences, while maintaining information learned from data.

machine learning, reinforcement learning, sequence tutor, (16 more...)

arXiv.org Artificial Intelligence

1611.02796

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

Two AIs Go Head-to-Head on Atari's 'Breakout' to Test Deep Learning

#artificialintelligenceOct-15-2017, 05:30:15 GMT

It seems like every day brings a new AI more capable than the last. This was recently apparent with AlphaGo--it was pretty great at beating Breakout, then Google got involved and soon it was capable of beating the world's leading Go champion. To do this, AlphaGo uses what is known as'deep reinforcement learning'. For example, in Breakout, it will take raw image frames of the game as it's being played. Whether or not the ball is hitting the bricks in those frames will decide whether or not positive reinforcement is registered.

machine learning, reinforcement, reinforcement learning, (9 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Go (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Manifold Regularization for Kernelized LSTD

Yan, Xinyan, Choromanski, Krzysztof, Boots, Byron, Sindhwani, Vikas

arXiv.org Machine LearningOct-15-2017

Policy evaluation or value function or Q-function approximation is a key procedure in reinforcement learning (RL). It is a necessary component of policy iteration and can be used for variance reduction in policy gradient methods. Therefore its quality has a significant impact on most RL algorithms. Motivated by manifold regularized learning, we propose a novel kernelized policy evaluation method that takes advantage of the intrinsic geometry of the state space learned from data, in order to achieve better sample efficiency and higher accuracy in Q-function approximation. Applying the proposed method in the Least-Squares Policy Iteration (LSPI) framework, we observe superior performance compared to widely used parametric basis functions on two standard benchmarks in terms of policy quality.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1710.05387

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Estimating Dynamic Treatment Regimes in Mobile Health Using V-learning

Luckett, Daniel J., Laber, Eric B., Kahkoska, Anna R., Maahs, David M., Mayer-Davis, Elizabeth, Kosorok, Michael R.

arXiv.org Machine LearningOct-14-2017

The vision for precision medicine is to use individual patient characteristics to inform a personalized treatment plan that leads to the best healthcare possible for each patient. Mobile technologies have an important role to play in this vision as they offer a means to monitor a patient's health status in real-time and subsequently to deliver interventions if, when, and in the dose that they are needed. Dynamic treatment regimes formalize individualized treatment plans as sequences of decision rules, one per stage of clinical intervention, that map current patient information to a recommended treatment. However, existing methods for estimating optimal dynamic treatment regimes are designed for a small number of fixed decision points occurring on a coarse time-scale. We propose a new reinforcement learning method for estimating an optimal treatment regime that is applicable to data collected using mobile technologies in an outpatient setting. The proposed method accommodates an indefinite time horizon and minute-by-minute decision making that are common in mobile health applications. We show the proposed estimators are consistent and asymptotically normal under mild conditions. The proposed methods are applied to estimate an optimal dynamic treatment regime for controlling blood glucose levels in patients with type 1 diabetes.

machine learning, reinforcement learning, treatment regime, (19 more...)

arXiv.org Machine Learning

1611.03531

Country: North America > United States > North Carolina (0.46)

Genre: Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

video-friday-spoon-robotic-creatures-ros-industrial-machine-knitting?utm_source=feedburner-robotics&utm_medium=feed&utm_campaign=Feed%3A+IeeeSpectrumRobotics+%28IEEE+Spectrum%3A+Robotics%29

IEEE Spectrum Robotics ChannelOct-13-2017, 21:16:51 GMT

Deep reinforcement learning (DRL) provides a model-agnostic approach to control complex dynamical systems, but has not been shown to scale to high-dimensional dexterous manipulation. Furthermore, deployment of DRL on physical systems remains challenging due to sample inefficiency. In this work, we show that model-free DRL with natural policy gradients can effectively scale up to complex manipulation tasks with a high-dimensional 24-DoF hand, and solve them from scratch in simulated experiments. We demonstrate successful policies for multiple complex tasks: object relocation, in-hand manipulation, tool use, and dooropening.

artificial intelligence, reinforcement learning, robot, (18 more...)

IEEE Spectrum Robotics Channel

Industry:

Information Technology > Robotics & Automation (0.76)
Leisure & Entertainment > Sports (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.55)

Add feedback

May the Best AI Win: Artificial Intelligence Learns Sumo Wrestling (VIDEO)

#artificialintelligenceOct-13-2017, 20:35:07 GMT

RoboSumo, one of the latest Open AI experiments in machine learning, involves a pair of'robots' dropped into a virtual arena without even the knowledge necessary to walk, and forced to learn the tricks of sumo wrestling purely by trial and error. The video posted on YouTube shows how the bots initially clash without employing any tactics or strategy, but after a number of bouts their movements start to resemble those of human wrestlers, as they learn to dodge and attack. According to the Wired, OpenAI researchers created RoboSumo because the competition apparently generated extra complexity which "could allow faster progress than just giving reinforcement learning software more complex problems to solve alone." "When you interact with other agents you have to adapt; if you don't you'll lose," Maruan Al-Shedivat, one of the RoboSumo creators, said.

large language model, machine learning, reinforcement learning, (7 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.31)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.31)

Add feedback

Unsupervised Real-Time Control through Variational Empowerment

Karl, Maximilian, Soelch, Maximilian, Becker-Ehmck, Philip, Benbouzid, Djalel, van der Smagt, Patrick, Bayer, Justin

arXiv.org Machine LearningOct-13-2017

We introduce a methodology for efficiently computing a lower bound to empowerment, allowing it to be used as an unsupervised cost function for policy learning in real-time control. Empowerment, being the channel capacity between actions and states, maximises the influence of an agent on its near future. It has been shown to be a good model of biological behaviour in the absence of an extrinsic goal. But empowerment is also prohibitively hard to compute, especially in nonlinear continuous spaces. We introduce an efficient, amortised method for learning empowerment-maximising policies. We demonstrate that our algorithm can reliably handle continuous dynamical systems using system dynamics learned from raw data. The resulting policies consistently drive the agents into states where they can use their full potential.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1710.05101

Country: Europe (0.93)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning

Lee, Kyungjae, Choi, Sungjoon, Oh, Songhwai

arXiv.org Machine LearningOct-13-2017

Arkov decision processes (MDPs) have been widely used as a mathematical framework to solve stochastic sequential decision problems, such as autonomous driving [1], path planning [2], and quadrotor control [3]. In general, the goal of an MDP is to find the optimal policy function which maximizes the expected return. The expected return is a performance measure of a policy function and it is often defined as the expected sum of discounted rewards. An MDP is often used to formulate reinforcement learning (RL) [4], which aims to find the optimal policy without the explicit specification of stochasticity of an environment, and inverse reinforcement learning (IRL) [5], whose goal is to search the proper reward function that can explain the behavior of an expert who follows the underlying optimal policy. While the optimal solution of an MDP is a deterministic policy, it is not desirable to apply an MDP to the problems with multiple optimal actions. In perspective of RL, the knowledge of multiple optimal actions makes it possible to cope with unexpected situations. For example, suppose that an autonomous vehicle has multiple optimal routes to reach a given goal. If a traffic accident occurs at the currently selected optimal route, it is possible to avoid the accident by choosing another safe optimal route without additional computation of a new optimal route.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1709.06293

Genre: Research Report (0.40)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.82)

Add feedback

Is Epicurus the father of Reinforcement Learning?

Vasilaki, Eleni

arXiv.org Machine LearningOct-12-2017

The Epicurean Philosophy is commonly thought as simplistic and hedonistic. Here I discuss how this is a misconception and explore its link to Reinforcement Learning. Based on the letters of Epicurus, I construct an objective function for hedonism which turns out to be equivalent of the Reinforcement Learning objective function when omitting the discount factor. I then discuss how Plato and Aristotle 's views that can be also loosely linked to Reinforcement Learning, as well as their weaknesses in relationship to it. Finally, I emphasise the close affinity of the Epicurean views and the Bellman equation.

epicurus, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1710.04582

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer

Isele, David, Rostami, Mohammad, Eaton, Eric

arXiv.org Machine LearningOct-10-2017

Knowledge transfer between tasks can improve the performance of learned models, but requires an accurate estimate of the inter-task relationships to identify the relevant knowledge to transfer. These inter-task relationships are typically estimated based on training data for each task, which is inefficient in lifelong learning settings where the goal is to learn each consecutive task rapidly from as little data as possible. To reduce this burden, we develop a lifelong learning method based on coupled dictionary learning that utilizes high-level task descriptions to model the inter-task relationships. We show that using task descriptors improves the performance of the learned task policies, providing both theoretical justification for the benefit and empirical demonstration of the improvement across a variety of learning problems. Given only the descriptor for a new task, the lifelong learner is also able to accurately predict a model for the new task through zero-shot learning using the coupled dictionary, eliminating the need to gather training data before addressing the task.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

1710.0385

Country: North America > United States > Pennsylvania (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Education > Educational Setting (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
(2 more...)

Add feedback