Reinforcement Learning
Exploration for Multi-task Reinforcement Learning with Deep Generative Models
Bangaru, Sai Praveen, Suhas, JS, Ravindran, Balaraman
Exploration in multi-task reinforcement learning is critical in training agents to deduce the underlying MDP. Many of the existing exploration frameworks such as $E^3$, $R_{max}$, Thompson sampling assume a single stationary MDP and are not suitable for system identification in the multi-task setting. We present a novel method to facilitate exploration in multi-task reinforcement learning using deep generative models. We supplement our method with a low dimensional energy model to learn the underlying MDP distribution and provide a resilient and adaptive exploration signal to the agent. We evaluate our method on a new set of environments and provide intuitive interpretation of our results.
Reinforcement Learning and AI
If you poled a group of data scientist just a few years back about how many machine learning problem types there are you would almost certainly have gotten a binary response: problem types were clearly divided into supervised and unsupervised. Supervised: You've got labeled data (clearly defined examples). Unsupervised: You've got data but it's not labeled. See if there's a structure in there. Supervised: You've got labeled data (clearly defined examples).
Deep Reinforcement Learning for Multi-Domain Dialogue Systems
Cuayรกhuitl, Heriberto, Yu, Seunghak, Williamson, Ashley, Carse, Jacob
Standard deep reinforcement learning methods such as Deep Q-Networks (DQN) for multiple tasks (domains) face scalability problems. We propose a method for multi-domain dialogue policy learning---termed NDQN, and apply it to an information-seeking spoken dialogue system in the domains of restaurants and hotels. Experimental results comparing DQN (baseline) versus NDQN (proposed) using simulations report that our proposed method exhibits better scalability and is promising for optimising the behaviour of multi-domain dialogue systems.
Google's DeepMind AI grasps basic laws of physics
Google DeepMind's artificial intelligence team, alongside researchers at the University of California, Berkeley, has trained AI machines to interact with objects in order to evaluate their properties without any prior awareness of physical laws. The research project drew inspiration from child development and sought to train AI to mirror human capacity to interact with physical objects and infer properties such as mass, friction, and malleability. The study, entitled Learning to perform physics experiments via deep reinforcement learning, explained that while recent advances in AI have achieved'superhuman performance' in complex control problems and other processing tasks, the machines still lack a common sense understanding of our physical world โ 'it is not clear that these systems can rival the scientific intuition of even a young child.' Lead researcher Misha Denil and his team set about various trials in different virtual environments in which the AI was faced with a series of blocks and tasked with assessing their properties. In the first simulation, called Which is Heavier, the AI was given a set of four blocks which were the same size but varied in mass.
Jetson Developer Meetup
Get to know some intelligent machines and the developers who built them. Join us for a night of cocktails/appetizers, tech talks, and learn how our partners, developers and start-ups are using the Jetson TX1 AI supercomputer to create intelligent devices to solve tomorrow's problems today. Meet Jetson partners and hear first-hand how they took their projects from idea to reality. Get to know folks from Horus, Parrot and many more --companies that are using Jetson every day! And, of course, we'll have swag.
Messing around with OpenAI Gym
First of all it might be useful to explain what OpenAI Gym actually does: OpenAI Gym aims to provide an easy environment to develop and test reinforcement learning algorithms. To be clear, OpenAI Gym doesn't power any algorithms itself, leaving it up to more specialised packages like TensorFlow or Theano. So what makes this the ultimate geek toy for AI-researchers? Well, this is because of the many environments OpenAI Gym provides, one of them being the'atari' environment. That's right, you can test the performance of your reinforcement learning algorithms on a variety of different atari games and what's more, you can automatically upload the performance of your algorithms and compare them to other people's approaches.
kidzik/osim-rl
OpenSim is a biomechanical physics environment for musculoskeletal simulations. Biomechanical community designed a range of musculoskeletal models compatible with this environment. These models can be, for example, fit to clinical data to understand underlying causes of injuries using inverse kinematics and inverse dynamics. For many of these models there are controllers designed for forward simulations of movement, however they are often finely tuned for the model and data. Advancements in reinforcement learning may allow building more robust controllers which can in turn provide another tool for validating the models.
Google's DeepMind AI --"Grasps Basic Laws of Physics"
When encountering novel object, humans and other animals are able to infer a wide range of physical properties such as mass, friction and deformability by interacting with themin a goal driven way. This process of active interaction is in the same spirit of a scientist performing an experiment to discover hidden facts. The study, entitled Learning to perform physics experiments via deep reinforcement learning, explained that while recent advances in AI have achieved'superhuman performance' in complex control problems and other processing tasks, the machines still lack a common sense understanding of our physical world โ 'it is not clear that these systems can rival the scientific intuition of even a young child.' "We found," the team concluded, "that state of art deep reinforcement learning methods can learn to perform the experiments necessary to discover these hidden properties of the physical world. By systematically manipulating the problem difficulty and the cost incurred by the AI agent for performing experiments, we found that agents learn different strategies that balance the cost of gathering information against the cost of making mistakes in different situations."
Memory Lens: How Much Memory Does an Agent Use?
Dann, Christoph, Hofmann, Katja, Nowozin, Sebastian
We propose a new method to study the internal memory used by reinforcement learning policies. We estimate the amount of relevant past information by estimating mutual information between behavior histories and the current action of an agent. We perform this estimation in the passive setting, that is, we do not intervene but merely observe the natural behavior of the agent. Moreover, we provide a theoretical justification for our approach by showing that it yields an implementation-independent lower bound on the minimal memory capacity of any agent that implement the observed policy. We demonstrate our approach by estimating the use of memory of DQN policies on concatenated Atari frames, demonstrating sharply different use of memory across 49 games. The study of memory as information that flows from the past to the current action opens avenues to understand and improve successful reinforcement learning algorithms.
Probabilistic Verification for Cognitive Models
Junges, Sebastian (RWTH Aachen University) | Jansen, Nils (University of Texas at Austin) | Katoen, Joost-Pieter (RWTH Aachen University) | Topcu, Ufuk (University of Texas at Austin)
Many robotics applications and scenarios that involve interaction with humans are safety or performance critical. A natural path to assessing such notions is to include a cognitive model describing typical human behaviors into a larger modeling context. In this work, we set out to investigate a combination of such a model with formal verification. We present a general and flexible framework utilizing methods from probabilistic model checking and discuss current pitfalls. We start from information about typical behavior, obtained from generalizing specific scenarios by the usage of inverse reinforcement learning. We translate this information in order to define a formal model exhibiting stochastic behavior (whenever significant data is present) or nondeterminism (if the model is underspecified or no significant data is present) that can be analyzed. This model for a human can be combined with a robot model by using standard parallel composition. The benefit is manyfold: First, safe or optimal strategies for involved robots regarding a human can be synthesized depending on the given model. In general, verification can determine if such benign strategies are even possible. Furthermore, the cognitive model itself can be analyzed with respect to possible unnatural behaviors; thereby feedback to developers of such models is provided. We evaluate and describe our approaches by means of a well-known model for visiomotor tasks and provide a framework that can readily incorporate other models.