Goto

Collaborating Authors

 observation approach


An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

Neural Information Processing Systems

We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning. This problem is particularly important in sim-to-real transfer because simulators inevitably model real-world dynamics imperfectly. In this paper, we show that one existing solution to this transfer problem-- grounded action transformation --is closely related to the problem of imitation from observation (IfO): learning behaviors that mimic the observations of behavior demonstrations. After establishing this relationship, we hypothesize that recent state-of-the-art approaches from the IfO literature can be effectively repurposed for grounded transfer learning. To validate our hypothesis we derive a new algorithm -- generative adversarial reinforced action transformation (GARAT) -- based on adversarial imitation from observation techniques. We run experiments in several domains with mismatched dynamics, and find that agents trained with GARAT achieve higher returns in the target environment compared to existing black-box transfer methods.


Review for NeurIPS paper: An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

Neural Information Processing Systems

Weaknesses: -The cost function formulation is similar to previous work of [17] and [43]. The adversarial objective minimized is based on prior work of [17] and [43]. Given this, the proposed approach does not offer significant technical novelty. However, the experiments are based on sim-to-sim evaluation where there are two simulator for a task and one of them is called'real'. I do not see such characterization as acceptable.


Review for NeurIPS paper: An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

Neural Information Processing Systems

Summary: This paper proposes a new technique for learning to transfer optimal policies obtained from a simulator to a real world environment. The only difference between sim and real is in the state transition probabilities. The main idea consists in learning an action grounding function that maps state-actions learned in simulation to modified actions that are executed in the real system. The authors notice that this problem is similar to a variant of imitation learning, where the imitator learns to match state trajectories (where the actions are unknown) demonstrated by an expert. Experiments on MuJoCO where the "real" environment is obtained by modifying physical properties (such as mass and friction) from their values in simulation.


An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

Neural Information Processing Systems

We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning. This problem is particularly important in sim-to-real transfer because simulators inevitably model real-world dynamics imperfectly. In this paper, we show that one existing solution to this transfer problem-- grounded action transformation --is closely related to the problem of imitation from observation (IfO): learning behaviors that mimic the observations of behavior demonstrations. After establishing this relationship, we hypothesize that recent state-of-the-art approaches from the IfO literature can be effectively repurposed for grounded transfer learning. To validate our hypothesis we derive a new algorithm -- generative adversarial reinforced action transformation (GARAT) -- based on adversarial imitation from observation techniques.