Reinforcement Learning
Deep Reinforcement Learning framework for Autonomous Driving
Sallab, Ahmad El, Abdou, Mohammed, Perot, Etienne, Yogamani, Senthil
Reinforcement learning is considered to be a strong AI paradigm which can be used to teach machines through interaction with the environment and learning from their mistakes. Despite its perceived utility, it has not yet been successfully applied in automotive applications. Motivated by the successful demonstrations of learning of Atari games and Go by Google DeepMind, we propose a framework for autonomous driving using deep reinforcement learning. This is of particular relevance as it is difficult to pose autonomous driving as a supervised learning problem due to strong interactions with the environment including other vehicles, pedestrians and roadworks. As it is a relatively new area of research for autonomous driving, we provide a short overview of deep reinforcement learning and then describe our proposed framework. It incorporates Recurrent Neural Networks for information integration, enabling the car to handle partially observable scenarios. It also integrates the recent work on attention models to focus on relevant information, thereby reducing the computational complexity for deployment on embedded hardware. The framework was tested in an open source 3D car racing simulator called TORCS. Our simulation results demonstrate learning of autonomous maneuvering in a scenario of complex road curvatures and simple interaction of other vehicles.
Elon Musk's OpenAI has unveiled an unusual approach to building smarter machines
In 2013 a British artificial-intelligence startup called DeepMind surprised computer scientists by showing off software that could learn to play classic Atari games better than an expert human player. DeepMind was soon acquired by Google, and the technique that beat the Atari games, reinforcement learning, has become a hot topic in the field of AI and robotics. Google used reinforcement learning to create software that beat a champion Go player last year. Now OpenAI, a nonprofit research institute cofounded and funded by Elon Musk, says it has discovered that an easier-to-use alternative to reinforcement learning can get rival results when it plays games and performs other tasks. At MIT Technology Review's EmTech Digital conference in San Francisco on Monday, OpenAI's research director, Ilya Sutskever, said that could allow researchers to make progress in machine learning faster.
Combining policy gradient and Q-learning
O'Donoghue, Brendan, Munos, Remi, Kavukcuoglu, Koray, Mnih, Volodymyr
Policy gradient is an efficient technique for improving a policy in a reinforcement learning setting. However, vanilla online variants are on-policy only and not able to take advantage of off-policy data. In this paper we describe a new technique that combines policy gradient with off-policy Q-learning, drawing experience from a replay buffer. This is motivated by making a connection between the fixed points of the regularized policy gradient algorithm and the Q-values. This connection allows us to estimate the Q-values from the action preferences of the policy, to which we apply Q-learning updates. We refer to the new technique as 'PGQL', for policy gradient and Q-learning. We also establish an equivalency between action-value fitting techniques and actor-critic algorithms, showing that regularized policy gradient techniques can be interpreted as advantage function learning algorithms. We conclude with some numerical examples that demonstrate improved data efficiency and stability of PGQL. In particular, we tested PGQL on the full suite of Atari games and achieved performance exceeding that of both asynchronous advantage actor-critic (A3C) and Q-learning.
Microsoft Maluuba teaches management 101 to machines in its first paper since being acquired
In mid-January, the ongoing race for AI put Montreal-based Maluuba on our radar. Microsoft acquired the startup and its team of researchers to build better machine intelligence tools for analyzing unstructured text to enable more natural human computer interaction -- think bots that can actually respond with reasonable intelligence to a text you send. The team dropped its first paper since being acquired and it sheds light on what the group's priorities are. The paper outlines a method for multi-advisor reinforcement learning that breaks problems down to be simpler and more easily computable. In oversimplified terms, Maluuba is effectively trying to teach leadership to groups of machines working to solve problems.
Risk-Constrained Reinforcement Learning with Percentile Risk Criteria
Chow, Yinlam, Ghavamzadeh, Mohammad, Janson, Lucas, Pavone, Marco
In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account \emph{risk}, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented via a chance constraint or a constraint on the conditional value-at-risk (CVaR) of the cumulative cost. We collectively refer to such problems as percentile risk-constrained MDPs. Specifically, we first derive a formula for computing the gradient of the Lagrangian function for percentile risk-constrained MDPs. Then, we devise policy gradient and actor-critic algorithms that (1) estimate such gradient, (2) update the policy in the descent direction, and (3) update the Lagrange multiplier in the ascent direction. For these algorithms we prove convergence to locally optimal policies. Finally, we demonstrate the effectiveness of our algorithms in an optimal stopping problem and an online marketing application.
Peter Stone: Robot Skill Learning: From the Real World to Simulation and Back CMU RI Seminar
Abstract: "For autonomous robots to operate in the open, dynamically changing world, they will need to be able to learn a robust set of interacting skills. This talk begins by introducing "Overlapping Layered Learning" as a novel hierarchical machine learning paradigm for learning such interacting skills in simulation. While learning in simulation is appealing because it avoids the prohibitive sample cost of learning in the real world, unfortunately policies learned in simulation often fail when applied on physical robots. This talk then introduces "Grounded Simulation Learning" to address this problem by algorithmically altering the simulator to better match the real world, and connects this new algorithm to a theoretical analysis of off-policy evaluation in reinforcement learning. Overlapping Layered Learning was the key deciding factor in UT Austin Villa's RoboCup robot soccer 3D simulation league championship, and Grounded Simulation Learning has led to the fastest known stable walk on a widely used humanoid robot."
The Next Challenges for Reinforcement Learning
Recent years have seen great progress for AI. In particular, artificial agents have learned to classify images and recognize speech at near-human level. However, for artificial agents to reach their full potential, they should not only observe, but also act and learn from the consequences of their actions. Learning how to behave is especially important when an agent interacts with humans through natural language, because of the complexity of language and because each person has a different communication style. Reinforcement learning (RL) is the area of research that is concerned with learning effective behavior in a data-driven way.
Apple's Artificial Intelligence Guru Talks About a Sci-Fi Future
Artificial intelligence has made great progress in helping computers recognize images in photos and recommending products online that you're more likely to buy. But the technology still faces many challenges, especially when it comes to computers remembering things like humans do. On Tuesday, Apple's director of AI research, Ruslan Salakhutdinov, discussed some of those limitations. However, he steered clear during his talk at an MIT Technology Review conference of how his secretive company incorporates AI into its products like Siri. Salakhutdinov, who joined Apple in October, said he is particularly interested in a type of AI known as reinforcement learning, which researchers use to teach computers to repeatedly take different actions to figure out the best possible result.
Apple's Artificial Intelligence Guru Talks About a Sci-Fi Future
Artificial intelligence has made great progress in helping computers recognize images in photos and recommending products online that you're more likely to buy. But the technology still faces many challenges, especially when it comes to computers remembering things like humans do. On Tuesday, Apple's director of AI research, Ruslan Salakhutdinov, discussed some of those limitations. However, he steered clear during his talk at an MIT Technology Review conference of how his secretive company incorporates AI into its products like Siri. Salakhutdinov, who joined Apple in October, said he is particularly interested in a type of AI known as reinforcement learning, which researchers use to teach computers to repeatedly take different actions to figure out the best possible result.
Deep learning boosted AI. Now the next big thing in machine intelligence is coming
Inside a simple computer simulation, a group of self-driving cars are performing a crazy-looking maneuver on a four-lane virtual highway. Half are trying to move from the right-hand lanes just as the other half try to merge from the left. It seems like just the sort of tricky thing that might flummox a robot vehicle, but they manage it with precision. I'm watching the driving simulation at the biggest artificial-intelligence conference of the year, held in Barcelona this past December. What's most amazing is that the software governing the cars' behavior wasn't programmed in the conventional sense at all.