AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Applying Adaptive Control in Modeling Human Motion Behaviors in Reinforcement Robotic Learning from Demonstrations

Tan, Huan (GE Global Research) | Zhao, Yang (GE Global Research) | Kannan, Balajee (GE Global Research)

AAAI ConferencesNov-19-2016

In this paper, we propose to use an adaptive control method as the basis of a reinforcement learning algorithm for robotic imitation learning. In the learning stage, robots use adaptive control method-based reinforcement learning algorithm to learn the parameters of dynamical systems. In the generation stage, robots use the learned dynamic system parameters and the pre-defined controller to drive the configuration states of the robot to move along desired state trajectories. One simu-lation experiment and one practical experiment on a robot are carried out to validate the effectiveness of our algorithm. The experimental results validate that the learning of the system parameters converges very fast and the learning results can improve the system performance of generating similar motion trajectories.

machine learning, modeling human motion behavior, reinforcement learning, (4 more...)

AAAI Conferences

2016 AAAI Fall Symposium Series

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)

Add feedback

Towards Behavior-Aware Model Learning from Human-Generated Trajectories

Loftin, Robert Tyler (North Carolina State University) | MacGlashan, James (Brown University) | Peng, Bei (Washington State University) | Taylor, Matthew E. (Washington State University) | Littman, Michael L. (Brown University) | Roberts, David L. (North Carolina State University)

AAAI ConferencesNov-19-2016

Inverse reinforcement learning algorithms recover an unknown reward function for a Markov decision process, based on observations of user behaviors that optimize this reward function. Here we consider the complementary problem of learning the unknown transition dynamics of an MDP based on such observations. We describe the behavior-aware modeling (BAM) algorithm, which learns models of transition dynamics from user generated state-action trajectories. BAM makes assumptions about how users select their actions that are similar to those used in inverse reinforcement learning, and searches for a model that maximizes the probability of the observed actions. The BAM algorithm is based on policy gradient algorithms, essentially reversing the roles of the policy and transition distribution in those algorithms. As a result, BAM is highly flexible, and can be applied to continuous state spaces using a wide variety of model representations. In this preliminary work, we discuss why the model learning problem is interesting, describe algorithms to solve this problem, and discuss directions for future work.

behavior-aware model learning, machine learning, reinforcement learning, (2 more...)

AAAI Conferences

2016 AAAI Fall Symposium Series

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Dimensionality Reduced Reinforcement Learning for Assistive Robots

Curran, William (Oregon State University) | Brys, Tim (Vrije Universiteit Brussel) | Aha, David (Navy Center for Applied Research in AI) | Taylor, Matthew (Washington State University) | Smart, William D. (Oregon State University)

AAAI ConferencesNov-19-2016

State-of-the-art personal robots need to perform complex manipulation tasks to be viable in assistive scenarios. However, many of these robots, like the PR2, use manipulators with high degrees-of-freedom, and the problem is made worse in bimanual manipulation tasks. The complexity of these robots lead to large dimensional state spaces, which are difficult to learn in. We reduce the state space by using demonstrations to discover a representative low-dimensional hyperplane in which to learn. This allows the agent to converge quickly to a good policy. We call this Dimensionality Reduced Reinforcement Learning (DRRL). However, when performing dimensionality reduction, not all dimensions can be fully represented. We extend this work by first learning in a single dimension, and then transferring that knowledge to a higher-dimensional hyperplane. By using our Iterative DRRL (IDRRL) framework with an existing learning algorithm, the agent converges quickly to a better policy by iterating to increasingly higher dimensions. IDRRL is robust to demonstration quality and can learn efficiently using few demonstrations. We show that adding IDRRL to the Q-Learning algorithm leads to faster learning on a set of mountain car tasks and the robot swimmers problem.

artificial intelligence, dimensionality reduced reinforcement learning, machine learning, (1 more...)

AAAI Conferences

2016 AAAI Fall Symposium Series

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

From Games to Assembly Lines, Robots Learn Faster Than Ever

#artificialintelligenceNov-18-2016, 17:15:33 GMT

A new artificial intelligence startup called Osaro aims to give industrial robots the same turbocharge that DeepMind Technologies gave Atari-playing computer programs. In December 2013, DeepMind showcased a type of artificial intelligence that had mastered seven Atari 2600 games from scratch in a matter of hours, and could outperform some of the best human players. Google swiftly snapped up the London-based company, and the deep-reinforcement learning technology behind it, for a reported $400 million. Now Osaro, with $3.3 million in investments from the likes of Peter Thiel and Jerry Yang, claims to have taken deep-reinforcement learning to the next level, delivering the same superhuman AI performance but over 100 times as fast. Deep-reinforcement learning arose from deep learning, a method of using multiple layers of neural networks to efficiently process and organize mountains of raw data (see "10 Breakthrough Technologies 2013: Deep Learning").

artificial intelligence, machine learning, reinforcement learning, (14 more...)

#artificialintelligence

Country: North America > United States > California > Alameda County > Berkeley (0.05)

Industry: Leisure & Entertainment > Games (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reward Function for q learning on a robot

#artificialintelligenceNov-17-2016, 08:55:23 GMT

I have 2 wheeled differential drive robot which I use pid for low level control to follow line. I implemented q learning which uses samples for 16 iterations then uses them to decide the best position to be on the line so car takes the turn from there. This allows PID to setup and smooth fast following. My question is how can I setup a reward function that improves the performance i.e. lets the q learning to find the best What it tries to learn is this, it has 16 inputs which contains the line positions for the last 15 iterations and this iteration. Line position is between -1 and 1 which -1 means only left most sensor sees the line and 0 means the line is in the center.

line position, machine learning, reinforcement learning, (5 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

[R] Reinforcement Learning with Unsupervised Auxiliary Tasks • /r/MachineLearning

#artificialintelligenceNov-17-2016, 05:25:29 GMT

Can someone explain how does the Loss function workout in the model's favor in 3.4 UNREAL AGENT? They're combining the loss function at first: The primary policy is trained with A3C, then The auxiliary tasks are trained on very recent sequences. Then it says "In practice, the loss is broken down into separate components that are applied either on-policy, directly from experience; or off-policy, on replayed transitions." What decided to apply which to either of the mentioned above components?

artificial intelligence, machine learning, reinforcement learning, (1 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

5 EBooks to Read Before Getting into A Machine Learning Career

#artificialintelligenceNov-16-2016, 20:41:22 GMT

Don't know where to start? If you are looking for something more, you could look here for an overview of MOOCs and online lectures from freely-available university lectures. Of course, nothing substitutes rigorous formal education, but let's say that isn't in the cards for whatever reason. Not all machine learning positions require a PhD; it really depends where on the machine learning spectrum one wants to fit in. Check out this motivating and inspirational post, the author of which went from little understanding of machine learning to actively and effectively utilizing techniques in their job within a year.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

#artificialintelligence

Country: North America > United States > Minnesota (0.05)

Genre: Instructional Material > Course Syllabus & Notes (0.50)

Industry:

Education > Educational Setting > Online (0.70)
Education > Educational Setting > Higher Education (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

Reinforcement Learning - Part 1

#artificialintelligenceNov-14-2016, 20:05:41 GMT

I'm going to begin a multipart series of posts on Reinforcement Learning (RL) that roughly follow an old 1996 textbook "Reinforcement Learning An Introduction" by Sutton and Barto. From my research, this text still seems to be the most thorough introduction to RL I could find. The Barto & Sutton text is itself a great read and is fairly approachable even for beginners, but I still think it's worth breaking down even further. It still amazes me how most of machine learning theory was established decades ago yet we've seen a huge explosion of interest and use in just the past several years largely due to dramatic improvements in computational power (i.e.

artificial intelligence, machine learning, reinforcement learning, (5 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

Duan, Yan, Schulman, John, Chen, Xi, Bartlett, Peter L., Sutskever, Ilya, Abbeel, Pieter

arXiv.org Machine LearningNov-9-2016

Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a "fast" reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data. In our proposed method, RL$^2$, the algorithm is encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm. The RNN receives all information a typical RL algorithm would receive, including observations, actions, rewards, and termination flags; and it retains its state across episodes in a given Markov Decision Process (MDP). The activations of the RNN store the state of the "fast" RL algorithm on the current (previously unseen) MDP. We evaluate RL$^2$ experimentally on both small-scale and large-scale problems. On the small-scale side, we train it to solve randomly generated multi-arm bandit problems and finite MDPs. After RL$^2$ is trained, its performance on new MDPs is close to human-designed algorithms with optimality guarantees. On the large-scale side, we test RL$^2$ on a vision-based navigation task and show that it scales up to high-dimensional problems.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1611.02779

Country: North America > United States > Massachusetts (0.28)

Genre: