AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

10 breakthrough technologies: Reinforcement learning MIT Technology Review

RobohubMar-28-2017, 13:15:24 GMT

By experimenting, computers are figuring out how to do things that no programmer could teach them.

artificial intelligence, breakthrough technology, machine learning, (2 more...)

Robohub

Genre: Research Report > Promising Solution (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

AI develops its own 'alien' language, the better to mock human underlings - ExtremeTech

#artificialintelligenceMar-28-2017, 02:25:24 GMT

Even more amazing, the researchers never explicitly programmed this AI communication. Instead, it "evolved" as a response to a reinforcement learning problem. While the jargon can get a bit technical, the OpenAI blog does a decent job of parsing it. The important thing to grok is the language was never defined, but rather hit upon as a solution to a general problem of learning to communicate. This type of AI method is called reinforcement learning, and involves the use of a reward signal to continually guide the agent towards an optimum outcome.

deep learning, extremetech, reinforcement learning, (3 more...)

#artificialintelligence

Industry: Education > Focused Education > Special Education (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.31)

Add feedback

The meta-parameter slot machine

#artificialintelligenceMar-28-2017, 00:40:28 GMT

Today we'll step back a bit and consider the psychology of a machine learning researcher when he does his job, a subject which interests me deeply and one that I've already touched in another post. Some of this comes from my own introspection, as I've been doing machine learning for quite a few years now. It is a well known fact from biology that little achievements trigger the release of small amounts of dopamine - a neurotransmitter that is believed to be involved in reinforcement learning. The dopamine makes us feel good and also triggers plasticity in certain parts of the brain (likely allowing the brain to "remember" what behaviour lead to the reward). Reinforcement learning however has its issues, since the reward can appear by coincidence and therefore reinforce the "wrong cause". This is very much visible these days with Internet, emails and texts: since receiving an important and rewarding message reinforces the behaviour which lead to it - and that most likely was pressing "get mail" button - we get addicted to checking email!

artificial intelligence, machine learning, reinforcement learning, (13 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area > Neurology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

#artificialintelligenceMar-26-2017, 21:15:19 GMT

Our finding continues the modern trend of achieving strong results with decades-old ideas. For example, in 2012, the "AlexNet" paper showed how to design, scale and train convolutional neural networks (CNNs) to achieve extremely strong results on image recognition tasks, at a time when most researchers thought that CNNs were not a promising approach to computer vision. Similarly, in 2013, the Deep Q-Learning paper showed how to combine Q-Learning with CNNs to successfully solve Atari games, reinvigorating RL as a research field with exciting experimental (rather than theoretical) results. Likewise, our work demonstrates that ES achieves strong performance on RL benchmarks, dispelling the common belief that ES methods are impossible to apply to high dimensional problems. ES is easy to implement and scale.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Add feedback

2017: The Year of Neuroevolution

#artificialintelligenceMar-26-2017, 17:50:25 GMT

This month OpenAI published a paper "Evolution Strategies as a Scalable Alternative to Reinforcement Learning" by Tim Salimans, Jonathan Ho, Xi Chen, Ilya Sutskever which shows Evolution Strategies (ES) can be a strong alternative to Reinforcement Learning (RL) and have a number of advantages like ease of implementation, invariance to the length of the episode and settings with sparse rewards, better exploration behaviour than policy gradient methods, ease to scale in a distributed setting. Running on a computing cluster of 80 machines and 1,440 CPU cores, authors' implementation was able to train a 3D MuJoCo humanoid walker in only 10 minutes (A3C on 32 cores takes about 10 hours). Using 720 cores they can also obtain comparable performance to A3C on Atari while cutting down the training time from 1 day to 1 hour. The communication overhead of implementing ES in a distributed setting is lower than for reinforcement learning methods such as policy gradients and Q-learning. By not requiring backpropagation, black box optimizers (the ones make no assumptions about the structure of the function being optimized) reduce the amount of computation per episode by about two thirds, and memory by potentially much more.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.39)

Add feedback

Deep Learning of Robotic Tasks without a Simulator using Strong and Weak Human Supervision

Hilleli, Bar, El-Yaniv, Ran

arXiv.org Artificial IntelligenceMar-26-2017

We propose a scheme for training a computerized agent to perform complex human tasks such as highway steering. The scheme is designed to follow a natural learning process whereby a human instructor teaches a computerized trainee. The learning process consists of five elements: (i) unsupervised feature learning; (ii) supervised imitation learning; (iii) supervised reward induction; (iv) supervised safety module construction; and (v) reinforcement learning. We implemented the last four elements of the scheme using deep convolutional networks and applied it to successfully create a computerized agent capable of autonomous highway steering over the well-known racing game Assetto Corsa. We demonstrate that the use of the last four elements is essential to effectively carry out the steering task using vision alone, without access to a driving simulator internals, and operating in wall-clock time. This is made possible also through the introduction of a safety network, a novel way for preventing the agent from performing catastrophic mistakes during the reinforcement learning stage.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1612.01086

Genre: Research Report (0.50)

Industry:

Transportation (0.68)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Inverse Reinforcement Learning in Swarm Systems

Šošić, Adrian, KhudaBukhsh, Wasiur R., Zoubir, Abdelhak M., Koeppl, Heinz

arXiv.org Artificial IntelligenceMar-24-2017

Inverse reinforcement learning (IRL) has become a useful tool for learning behavioral models from demonstration data. However, IRL remains mostly unexplored for multi-agent systems. In this paper, we show how the principle of IRL can be extended to homogeneous large-scale problems, inspired by the collective swarming behavior of natural systems. In particular, we make the following contributions to the field: 1) We introduce the swarMDP framework, a sub-class of decentralized partially observable Markov decision processes endowed with a swarm characterization. 2) Exploiting the inherent homogeneity of this framework, we reduce the resulting multi-agent IRL problem to a single-agent one by proving that the agent-specific value functions in this model coincide. 3) To solve the corresponding control problem, we propose a novel heterogeneous learning scheme that is particularly tailored to the swarm setting. Results on two example systems demonstrate that our framework is able to produce meaningful local reward models from which we can replicate the observed global system dynamics.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1602.0545

Country: Europe (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.93)

Add feedback

Pit.ai puts a financial twist on reinforcement learning to outperform hedge funds

#artificialintelligenceMar-23-2017, 13:00:43 GMT

Despite mystery and intrigue, the reality is that most hedge funds don't make money. This hasn't stopped a growing list of startups from trying their hands at employing machine learning to tip the scales in their favor. But Pit.ai, a new machine learning-powered hedge fund, adopted into the YC W17 class, thinks it can best Numerai, Quantopian and others with its own unique recipe for automating money making. Hedge funds employ aggressive trading strategies to "seek alpha," which is industry jargon for above market returns. These are not your standard trading shops, and over the last decade firms have gone to great lengths to seize data for information arbitrage.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

#artificialintelligence

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)

Add feedback

Unsupervised Basis Function Adaptation for Reinforcement Learning

Barker, Edward, Ras, Charl

arXiv.org Machine LearningMar-23-2017

When using reinforcement learning (RL) algorithms to evaluate a policy it is common, given a large state space, to introduce some form of approximation architecture for the value function (VF). The exact form of this architecture can have a significant effect on the accuracy of the VF estimate, however, and determining a suitable approximation architecture can often be a highly complex task. Consequently there is a large amount of interest in the potential for allowing RL algorithms to adaptively generate (i.e. to learn) approximation architectures. We investigate a method of adapting approximation architectures which uses feedback regarding the frequency with which an agent has visited certain states to guide which areas of the state space to approximate with greater detail. We introduce an algorithm based upon this idea which adapts a state aggregation approximation architecture on-line. Assuming $S$ states, we demonstrate theoretically that - provided the following relatively non-restrictive assumptions are satisfied: (a) the number of cells $X$ in the state aggregation architecture is of order $\sqrt{S}\ln{S}\log_2{S}$ or greater, (b) the policy and transition function are close to deterministic, and (c) the prior for the transition function is uniformly distributed - our algorithm can guarantee, assuming we use an appropriate scoring function to measure VF error, error which is arbitrarily close to zero as $S$ becomes large. It is able to do this despite having only $O(X\log_2{S})$ space complexity (and negligible time complexity). We conclude by generating a set of empirical results which support the theoretical results.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1703.0794

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Exploration via Randomized Value Functions

Osband, Ian, Russo, Daniel, Wen, Zheng, Van Roy, Benjamin

arXiv.org Machine LearningMar-22-2017

We study the use of randomized value functions to guide deep exploration in reinforcement learning. This offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to value function learning. We present several reinforcement learning algorithms that leverage randomized value functions and demonstrate their efficacy through computational studies. We also prove a regret bound that establishes statistical efficiency with a tabular representation.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1703.07608

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)

Add feedback