Goto

Collaborating Authors

 observation alone


MobILE: Model-Based Imitation Learning From Observation Alone

Neural Information Processing Systems

This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that consist only of states visited by an expert (without access to actions taken by the expert). We present a provably efficient model-based framework MobILE to solve the ILFO problem. MobILE involves carefully trading off exploration against imitation - this is achieved by integrating the idea of optimism in the face of uncertainty into the distribution matching imitation learning (IL) framework. We provide a unified analysis for MobILE, and demonstrate that MobILE enjoys strong performance guarantees for classes of MDP dynamics that satisfy certain well studied notions of complexity. We also show that the ILFO problem is strictly harder than the standard IL problem by reducing ILFO to a multi-armed bandit problem indicating that exploration is necessary for solving ILFO efficiently. We complement these theoretical results with experimental simulations on benchmark OpenAI Gym tasks that indicate the efficacy of MobILE.


MobILE: Model-Based Imitation Learning From Observation Alone

Neural Information Processing Systems

This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that consist only of states visited by an expert (without access to actions taken by the expert). We present a provably efficient model-based framework MobILE to solve the ILFO problem. MobILE involves carefully trading off exploration against imitation - this is achieved by integrating the idea of optimism in the face of uncertainty into the distribution matching imitation learning (IL) framework. We provide a unified analysis for MobILE, and demonstrate that MobILE enjoys strong performance guarantees for classes of MDP dynamics that satisfy certain well studied notions of complexity. We also show that the ILFO problem is strictly harder than the standard IL problem by reducing ILFO to a multi-armed bandit problem indicating that exploration is necessary for solving ILFO efficiently.


Provable Representation Learning for Imitation Learning via Bi-level Optimization

Arora, Sanjeev, Du, Simon S., Kakade, Sham, Luo, Yuping, Saunshi, Nikunj

arXiv.org Artificial Intelligence

A common strategy in modern learning systems is to learn a representation that is useful for many tasks, a.k.a. representation learning. We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts' trajectories are available. We formulate representation learning as a bi-level optimization problem where the "outer" optimization tries to learn the joint representation and the "inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters. We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone. Theoretically, we show using our framework that representation learning can provide sample complexity benefits for imitation learning in both settings. We also provide proof-of-concept experiments to verify our theory.


A New AI Learns Through Observation Alone: What That Means for Drone Surveillance

#artificialintelligence

A breakthrough will allow machines to learn by observing. This Turing Learning, as its inventors have named it, promises smarter drones that could detect militants engaging in behavior that could endanger troops, like planting roadside bombs. Still in its infancy, the new machine learning technique is named for British mathematician Alan Turing, whose famous test challenges artificial intelligences to fool a human into thinking he or she is conversing with another human. In Turing learning, a program dubbed the "classifier" tries to learn about a system designed to fool it. In certain ways, Turing Learning resembles many existing machine-learning systems.