Reinforcement Learning
[P] Deep reinforcement learning tutorial, battleship • /r/MachineLearning
I think this application is kind of fascinating. There is a probability distribution mainatined on the board of possible ship locations, and samples are made based on this estimate. This "solves" battleship in a way. But I believe it uses a monte-carlo search to get the probability densities. A properly trained CNN might be able to do this in a single forward pass.
Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving
Shalev-Shwartz, Shai, Shammah, Shaked, Shashua, Amnon
Autonomous driving is a multi-agent setting where the host vehicle must apply sophisticated negotiation skills with other road users when overtaking, giving way, merging, taking left and right turns and while pushing ahead in unstructured urban roadways. Since there are many possible scenarios, manually tackling all possible cases will likely yield a too simplistic policy. Moreover, one must balance between unexpected behavior of other drivers/pedestrians and at the same time not to be too defensive so that normal traffic flow is maintained. In this paper we apply deep reinforcement learning to the problem of forming long term driving strategies. We note that there are two major challenges that make autonomous driving different from other robotic tasks. First, is the necessity for ensuring functional safety - something that machine learning has difficulty with given that performance is optimized at the level of an expectation over many instances. Second, the Markov Decision Process model often used in robotics is problematic in our case because of unpredictable behavior of other agents in this multi-agent scenario. We make three contributions in our work. First, we show how policy gradient iterations can be used without Markovian assumptions. Second, we decompose the problem into a composition of a Policy for Desires (which is to be learned) and trajectory planning with hard constraints (which is not learned). The goal of Desires is to enable comfort of driving, while hard constraints guarantees the safety of driving. Third, we introduce a hierarchical temporal abstraction we call an "Option Graph" with a gating mechanism that significantly reduces the effective horizon and thereby reducing the variance of the gradient estimation even further.
Tech Giants Partner on Artificial Intelligence
The Partnership on AI (inventive name, it is not) has brought together Amazon, Google, Facebook, IBM, Microsoft, and others to debate best practices and host A.I.-related events. The Partnership on AI isn't the first high-profile collaboration among tech luminaries to tackle the heavy questions surrounding artificial intelligence and machine learning. Earlier this year, Tesla CEO Elon Musk joined with venture capitalist Peter Thiel and others to launch OpenAI, a non-profit "artificial intelligence research company" devoted to developing A.I. that's friendly to humanity. While both OpenAI and the Partnership on AI are focused on promoting ethical A.I. research, as well as advancing public understanding of the potential (and pitfalls) of machine learning, OpenAI has pushed ahead in offering materials and toolkits for researchers. The OpenAI Gym, for example, is a platform for building reinforcement learning (RL) algorithms, a vital aspect of artificial-intelligence development.
Deep Reinforcement Learning with Online Generalized Advantage Estimation – Tom Breloff
Deep Reinforcement Learning, or Deep RL, is a really hot field at the moment. If you haven't heard of it, pay attention. Combining the power of reinforcement learning and deep learning, it is being used to play complex games better than humans, control driverless cars, optimize robotic decisions and limb trajectories, and much more. And we haven't even gotten started… Deep RL has far reaching applications in business, finance, health care, and many other fields which could be improved with better decision making. It's the closest (practical) approach we have to AGI.
AI trained to slay players in a computer game could one day lead to killer robots
Two students have built an AI that could be the basis of future killer robots. In a controversial move, the pair trained an AI bot to kill human players within the classic video game Doom. Critics have expressed concern over the AI technology and the risk it could pose to humans in future. Devendra Chaplot and Guillaume Lample, from Carnegie Mellon University in Pittsburgh trained an AI bot - nicknamed Arnold - using'deep reinforcement learning' techniques. While Google's AI software had previously been shown to tackle vintage 2D Atari games such as Space Invaders, the students wanted to expand the technology to tackle three-dimensional first-person shooter games like Doom.
Learning Reinforcement Learning (with Code, Exercises and Solutions)
Skip all the talk and go directly to the Github Repo with code and exercises. Reinforcement Learning is one of the fields I'm most excited about. Over the past few years amazing results like learning to play Atari Games from raw pixels and Mastering the Game of Go have gotten a lot of attention, but RL is also widely used in Robotics, Image Processing and Natural Language Processing. Combining Reinforcement Learning and Deep Learning techniques works extremely well. Both fields heavily influence each other.
Deep Static and Dynamic Level Analysis: A Study on Infinite Mario
Guzdial, Matthew James (Georgia Institute of Technology) | Sturtevant, Nathan (University of Denver) | Li, Boyang (Disney Research)
Automatic analysis of game levels can provide as- sistance to game designers and procedural content generation. We introduce a static-dynamic scale to categorize level analysis strategies, which captures the extent that the analysis depends on player simulation. Due to its ability to automatically learn intermediate representations for the task, a convolutional neural network (CNN) provides a general tool for both types of analysis. In this paper, we explore the use of CNN to analyze 1,437 Infinite Mario levels. We further propose a deep reinforcement learning technique for dynamic analysis, which allows the simulated player to pay a penalty to reduce error in its control. We empirically demonstrate the effectiveness of our techniques and complementarity of dynamic and static analysis.
"Easy" reinforcement learning tasks for sanity checking? • /r/MachineLearning
In deep supervised learning, you can overfit a small dataset as a sanity check: making sure your model is implemented correctly and can actually learn before going on to train on your real, big dataset. Are there similar strategies in reinforcement learning, where one can get results in a few minutes before moving on to spend a day training on Space Invaders or even Pong?
Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search
Yahya, Ali, Li, Adrian, Kalakrishnan, Mrinal, Chebotar, Yevgen, Levine, Sergey
In principle, reinforcement learning and policy search methods can enable robots to learn highly complex and general skills that may allow them to function amid the complexity and diversity of the real world. However, training a policy that generalizes well across a wide range of real-world conditions requires far greater quantity and diversity of experience than is practical to collect with a single robot. Fortunately, it is possible for multiple robots to share their experience with one another, and thereby, learn a policy collectively. In this work, we explore distributed and asynchronous policy learning as a means to achieve generalization and improved training times on challenging, real-world manipulation tasks. We propose a distributed and asynchronous version of Guided Policy Search and use it to demonstrate collective policy learning on a vision-based door opening task using four robots. We show that it achieves better generalization, utilization, and training times than the single robot alternative.