Reinforcement Learning
Watch a 'virtual stuntman' break dance and perform martial arts in machine learning breakthrough
Researchers have created a tool that will make simulations more realistic. A team at the University of California Berkeley used deep reinforcement learning in order to let computer simulations mimic natural human movements. Their tool will allow video game characters to move and animated movie scenes to play out with the fluidity and rhythm of the real world. The recreations of natural movements will make simulations of animals and humans much less clumsy, a report on the new technology said. The feat will even improve scenes that include complex acrobatic feats, such martial arts and break dancing.
Successor Features for Transfer in Reinforcement Learning
Barreto, Andrรฉ, Dabney, Will, Munos, Rรฉmi, Hunt, Jonathan J., Schaul, Tom, van Hasselt, Hado, Silver, David
Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks. We propose a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same. Our approach rests on two key ideas: "successor features", a value function representation that decouples the dynamics of the environment from the rewards, and "generalized policy improvement", a generalization of dynamic programming's policy improvement operation that considers a set of policies rather than a single one. Put together, the two ideas lead to an approach that integrates seamlessly within the reinforcement learning framework and allows the free exchange of information across tasks. The proposed method also provides performance guarantees for the transferred policy even before any learning has taken place. We derive two theorems that set our approach in firm theoretical ground and present experiments that show that it successfully promotes transfer in practice, significantly outperforming alternative methods in a sequence of navigation tasks and in the control of a simulated robotic arm.
Towards a virtual stuntman
Motion control problems have become standard benchmarks for reinforcement learning, and deep RL methods have been shown to be effective for a diverse suite of tasks ranging from manipulation to locomotion. However, characters trained with deep RL often exhibit unnatural behaviours, bearing artifacts such as jittering, asymmetric gaits, and excessive movement of limbs. Can we train our characters to produce more natural behaviours? A wealth of inspiration can be drawn from computer graphics, where the physics-based simulation of natural movements have been a subject of intense study for decades. The greater emphasis placed on motion quality is often motivated by applications in film, visual effects, and games.
Advanced Artificial Intelligence Projects with Python
Considered the Holy Grail of automation, data analysis, and robotics, Artificial Intelligence has taken the world by storm as a major field of research and development. Python has surfaced as a dominate language in AI/ML programming because of its simplicity and flexibility, in addition to its great support for open source libraries such as spaCy and TensorFlow. This video course is built for those with a basic understanding of artificial intelligence, introducing them to advanced artificial intelligence projects as they go ahead. The first project introduces natural language processing including part-of-speech tagging and named entity extraction. Wikipedia articles are used to demonstrate the extraction of keywords, and the Enron email archive is mined for mentions and relationships of people, places, and organizations.
Diving deeper into Reinforcement Learning with Q-Learning
Today we'll learn about Q-Learning. Q-Learning is a value-based Reinforcement Learning algorithm. This article is the second part of a free series of blog post about Deep Reinforcement Learning. See the first article here. In this article you'll learn: Let's say you're a knight and you need to save the princess trapped in the castle shown on the map above. You can move one tile at a time.
Personalized Dynamics Models for Adaptive Assistive Navigation Interfaces
Ohn-Bar, Eshed, Kitani, Kris, Asakawa, Chieko
We explore the role of personalization for assistive navigational systems (e.g., service robot, wearable system or smartphone app) that guide visually impaired users through speech, sound and haptic-based instructional guidance. Based on our analysis of real-world users, we show that the dynamics of blind users cannot be accounted for by a single universal model but instead must be learned on an individual basis. To learn personalized instructional interfaces, we propose PING (Personalized INstruction Generation agent), a model-based reinforcement learning framework which aims to quickly adapt its state transition dynamics model to match the reactions of the user using a novel end-to-end learned weighted majority-based regression algorithm. In our experiments, we show that PING learns dynamics models significantly faster compared to baseline transfer learning approaches on real-world data. We find that through better reasoning over personal mobility nuances, interaction with surrounding obstacles, and the current navigation task, PING is able to improve the performance of instructional assistive navigation at the most crucial junctions such as turns or veering paths. To enable sufficient planning time over user responses, we emphasize prediction of human motion for long horizons. Specifically, the learned dynamics models are shown to consistently improve long-term position prediction by over 1 meter on average (nearly the width of a hallway) compared to baseline approaches even when considering a prediction horizon of 20 seconds into the future.
DORA The Explorer: Directed Outreaching Reinforcement Action-Selection
Choshen, Leshem, Fox, Lior, Loewenstein, Yonatan
Exploration is a fundamental aspect of Reinforcement Learning, typically implemented using stochastic action-selection. Exploration, however, can be more efficient if directed toward gaining new world knowledge. Visit-counters have been proven useful both in practice and in theory for directed exploration. However, a major limitation of counters is their locality. While there are a few model-based solutions to this shortcoming, a model-free approach is still missing. We propose $E$-values, a generalization of counters that can be used to evaluate the propagating exploratory value over state-action trajectories. We compare our approach to commonly used RL techniques, and show that using $E$-values improves learning and performance over traditional counters. We also show how our method can be implemented with function approximation to efficiently learn continuous MDPs. We demonstrate this by showing that our approach surpasses state of the art performance in the Freeway Atari 2600 game.
Market Making via Reinforcement Learning
Spooner, Thomas, Fearnley, John, Savani, Rahul, Koukorinis, Andreas
Market making is a fundamental trading problem in which an agent provides liquidity by continually offering to buy and sell a security. The problem is challenging due to inventory risk, the risk of accumulating an unfavourable position and ultimately losing money. In this paper, we develop a high-fidelity simulation of limit order book markets, and use it to design a market making agent using temporal-difference reinforcement learning. We use a linear combination of tile codings as a value function approximator, and design a custom reward function that controls inventory risk. We demonstrate the effectiveness of our approach by showing that our agent outperforms both simple benchmark strategies and a recent online learning approach from the literature.
Model-Based Action Exploration for Learning Dynamic Motion Skills
Berseth, Glen, van de Panne, Michiel
Deep reinforcement learning has achieved great strides in solving challenging motion control tasks. Recently, there has been significant work on methods for exploiting the data gathered during training, but there has been less work on how to best generate the data to learn from. For continuous action domains, the most common method for generating exploratory actions involves sampling from a Gaussian distribution centred around the mean action output by a policy. Although these methods can be quite capable, they do not scale well with the dimensionality of the action space, and can be dangerous to apply on hardware. We consider learning a forward dynamics model to predict the result, ($x_{t+1}$), of taking a particular action, ($u$), given a specific observation of the state, ($x_{t}$). With this model we perform internal look-ahead predictions of outcomes and seek actions we believe have a reasonable chance of success. This method alters the exploratory action space, thereby increasing learning speed and enables higher quality solutions to difficult problems, such as robotic locomotion and juggling.
Advanced Statistics for Machine Learning Udemy
Complex statistics in Machine Learning worry a lot of developers. Knowing statistics helps you build strong Machine Learning models that are optimized for a given problem statement. This video will teach you all it takes to perform the complex statistical computations required for Machine Learning. You will gain information on statistics behind unsupervised learning, reinforcement learning, and more. You'll master real-world examples that discuss the statistical side of Machine Learning.