Seeing Differently, Acting Similarly: Imitation Learning with Heterogeneous Observations
Cai, Xin-Qiang, Ding, Yao-Xiang, Chen, Zi-Xuan, Jiang, Yuan, Sugiyama, Masashi, Zhou, Zhi-Hua
–arXiv.org Artificial Intelligence
In many real-world imitation learning tasks, the demonstrator and the learner have to act in different but full observation spaces. This situation generates significant obstacles for existing imitation learning approaches to work, even when they are combined with traditional space adaptation techniques. The main challenge lies in bridging expert's occupancy measures to learner's dynamically changing occupancy measures under the different observation spaces. In this work, we model the above learning problem as Heterogeneous Observations Imitation Learning (HOIL). We propose the Importance Weighting with REjection (IWRE) algorithm based on the techniques of importance-weighting, learning with rejection, and active querying to solve the key challenge of occupancy measure matching. Experimental results show that IWRE can successfully solve HOIL tasks, including the challenging task of transforming the vision-based demonstrations to random access memory (RAM)-based policies under the Atari domain.
arXiv.org Artificial Intelligence
Jun-17-2021
- Country:
- Oceania > Australia
- Queensland > Brisbane (0.04)
- New South Wales > Sydney (0.04)
- North America
- United States
- Texas > Travis County
- Austin (0.04)
- New York
- Richmond County > New York City (0.04)
- Queens County > New York City (0.04)
- New York County > New York City (0.04)
- Kings County > New York City (0.04)
- Bronx County > New York City (0.04)
- California > Los Angeles County
- Long Beach (0.14)
- Texas > Travis County
- Puerto Rico > San Juan
- San Juan (0.04)
- Canada > Quebec
- Capitale-Nationale Region
- Québec (0.04)
- Quebec City (0.04)
- Capitale-Nationale Region
- United States
- Europe
- Asia
- Oceania > Australia
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.46)
- Technology: