Learning Visually Guided Latent Actions for Assistive Teleoperation
Karamcheti, Siddharth, Zhai, Albert J., Losey, Dylan P., Sadigh, Dorsa
–arXiv.org Artificial Intelligence
It is challenging for humans -- particularly those living with physical disabilities -- to control high-dimensional, dexterous robots. Prior work explores learning embedding functions that map a human's low-dimensional inputs (e.g., via a joystick) to complex, high-dimensional robot actions for assistive teleoperation; however, a central problem is that there are many more high-dimensional actions than available low-dimensional inputs. To extract the correct action and maximally assist their human controller, robots must reason over their context: for example, pressing a joystick down when interacting with a coffee cup indicates a different action than when interacting with knife. In this work, we develop assistive robots that condition their latent embeddings on visual inputs. We explore a spectrum of visual encoders and show that incorporating object detectors pretrained on small amounts of cheap, easy-to-collect structured data enables i) accurately and robustly recognizing the current context and ii) generalizing control embeddings to new objects and tasks. In user studies with a high-dimensional physical robot arm, participants leverage this approach to perform new tasks with unseen objects. Our results indicate that structured visual representations improve few-shot performance and are subjectively preferred by users.
arXiv.org Artificial Intelligence
May-2-2021
- Country:
- North America > United States
- California > Santa Clara County
- Palo Alto (0.04)
- Virginia (0.04)
- California > Santa Clara County
- North America > United States
- Genre:
- Research Report > New Finding (0.48)
- Technology: