Goto

Collaborating Authors

 Reinforcement Learning


Applying Adaptive Control in Modeling Human Motion Behaviors in Reinforcement Robotic Learning from Demonstrations

AAAI Conferences

In this paper, we propose to use an adaptive control method as the basis of a reinforcement learning algorithm for robotic imitation learning. In the learning stage, robots use adaptive control method-based reinforcement learning algorithm to learn the parameters of dynamical systems. In the generation stage, robots use the learned dynamic system parameters and the pre-defined controller to drive the configuration states of the robot to move along desired state trajectories. One simu-lation experiment and one practical experiment on a robot are carried out to validate the effectiveness of our algorithm. The experimental results validate that the learning of the system parameters converges very fast and the learning results can improve the system performance of generating similar motion trajectories.


Towards Behavior-Aware Model Learning from Human-Generated Trajectories

AAAI Conferences

Inverse reinforcement learning algorithms recover an unknown reward function for a Markov decision process, based on observations of user behaviors that optimize this reward function. Here we consider the complementary problem of learning the unknown transition dynamics of an MDP based on such observations. We describe the behavior-aware modeling (BAM) algorithm, which learns models of transition dynamics from user generated state-action trajectories. BAM makes assumptions about how users select their actions that are similar to those used in inverse reinforcement learning, and searches for a model that maximizes the probability of the observed actions. The BAM algorithm is based on policy gradient algorithms, essentially reversing the roles of the policy and transition distribution in those algorithms. As a result, BAM is highly flexible, and can be applied to continuous state spaces using a wide variety of model representations. In this preliminary work, we discuss why the model learning problem is interesting, describe algorithms to solve this problem, and discuss directions for future work.


Dimensionality Reduced Reinforcement Learning for Assistive Robots

AAAI Conferences

State-of-the-art personal robots need to perform complex manipulation tasks to be viable in assistive scenarios. However, many of these robots, like the PR2, use manipulators with high degrees-of-freedom, and the problem is made worse in bimanual manipulation tasks. The complexity of these robots lead to large dimensional state spaces, which are difficult to learn in. We reduce the state space by using demonstrations to discover a representative low-dimensional hyperplane in which to learn. This allows the agent to converge quickly to a good policy. We call this Dimensionality Reduced Reinforcement Learning (DRRL). However, when performing dimensionality reduction, not all dimensions can be fully represented. We extend this work by first learning in a single dimension, and then transferring that knowledge to a higher-dimensional hyperplane. By using our Iterative DRRL (IDRRL) framework with an existing learning algorithm, the agent converges quickly to a better policy by iterating to increasingly higher dimensions. IDRRL is robust to demonstration quality and can learn efficiently using few demonstrations. We show that adding IDRRL to the Q-Learning algorithm leads to faster learning on a set of mountain car tasks and the robot swimmers problem.


From Games to Assembly Lines, Robots Learn Faster Than Ever

#artificialintelligence

A new artificial intelligence startup called Osaro aims to give industrial robots the same turbocharge that DeepMind Technologies gave Atari-playing computer programs. In December 2013, DeepMind showcased a type of artificial intelligence that had mastered seven Atari 2600 games from scratch in a matter of hours, and could outperform some of the best human players. Google swiftly snapped up the London-based company, and the deep-reinforcement learning technology behind it, for a reported $400 million. Now Osaro, with $3.3 million in investments from the likes of Peter Thiel and Jerry Yang, claims to have taken deep-reinforcement learning to the next level, delivering the same superhuman AI performance but over 100 times as fast. Deep-reinforcement learning arose from deep learning, a method of using multiple layers of neural networks to efficiently process and organize mountains of raw data (see "10 Breakthrough Technologies 2013: Deep Learning").


Reward Function for q learning on a robot

#artificialintelligence

I have 2 wheeled differential drive robot which I use pid for low level control to follow line. I implemented q learning which uses samples for 16 iterations then uses them to decide the best position to be on the line so car takes the turn from there. This allows PID to setup and smooth fast following. My question is how can I setup a reward function that improves the performance i.e. lets the q learning to find the best What it tries to learn is this, it has 16 inputs which contains the line positions for the last 15 iterations and this iteration. Line position is between -1 and 1 which -1 means only left most sensor sees the line and 0 means the line is in the center.


[R] Reinforcement Learning with Unsupervised Auxiliary Tasks โ€ข /r/MachineLearning

#artificialintelligence

Can someone explain how does the Loss function workout in the model's favor in 3.4 UNREAL AGENT? They're combining the loss function at first: The primary policy is trained with A3C, then The auxiliary tasks are trained on very recent sequences. Then it says "In practice, the loss is broken down into separate components that are applied either on-policy, directly from experience; or off-policy, on replayed transitions." What decided to apply which to either of the mentioned above components?


5 EBooks to Read Before Getting into A Machine Learning Career

#artificialintelligence

Don't know where to start? If you are looking for something more, you could look here for an overview of MOOCs and online lectures from freely-available university lectures. Of course, nothing substitutes rigorous formal education, but let's say that isn't in the cards for whatever reason. Not all machine learning positions require a PhD; it really depends where on the machine learning spectrum one wants to fit in. Check out this motivating and inspirational post, the author of which went from little understanding of machine learning to actively and effectively utilizing techniques in their job within a year.


Reinforcement Learning - Part 1

#artificialintelligence

I'm going to begin a multipart series of posts on Reinforcement Learning (RL) that roughly follow an old 1996 textbook "Reinforcement Learning An Introduction" by Sutton and Barto. From my research, this text still seems to be the most thorough introduction to RL I could find. The Barto & Sutton text is itself a great read and is fairly approachable even for beginners, but I still think it's worth breaking down even further. It still amazes me how most of machine learning theory was established decades ago yet we've seen a huge explosion of interest and use in just the past several years largely due to dramatic improvements in computational power (i.e.


RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

arXiv.org Machine Learning

Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a "fast" reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data. In our proposed method, RL$^2$, the algorithm is encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm. The RNN receives all information a typical RL algorithm would receive, including observations, actions, rewards, and termination flags; and it retains its state across episodes in a given Markov Decision Process (MDP). The activations of the RNN store the state of the "fast" RL algorithm on the current (previously unseen) MDP. We evaluate RL$^2$ experimentally on both small-scale and large-scale problems. On the small-scale side, we train it to solve randomly generated multi-arm bandit problems and finite MDPs. After RL$^2$ is trained, its performance on new MDPs is close to human-designed algorithms with optimality guarantees. On the large-scale side, we test RL$^2$ on a vision-based navigation task and show that it scales up to high-dimensional problems.