Zero-Shot Visual Generalization in Robot Manipulation

Batra, Sumeet, Sukhatme, Gaurav

arXiv.org Artificial Intelligence 

A key requirement of any generalist robot system deployed in the real-world is the ability to perform tasks across visually diverse environments. High-dimensional inputs like RGB images offer rich information but also introduce complexity due to the curse of dimensionality. Given the enormous diversity of real-world visual data, accounting for every possible variation within a fixed dataset is intractable. Extracting the underlying structural knowledge of the world from visual data while being robust to semantically irrelevant visual perturbations remains an open question. The robot learning field has largely relied on one of several trends, one of which is to train agents in simulation, where visual complexity can be controlled and large-scale synthetic and diverse data can be generated efficiently through GPU-accelerated simulators [1, 2, 3]. However, transferring policies trained in simulation to the real world is hindered by the "Sim2Real" gap caused by mismatches in fidelity and unmodeled dynamics. Domain randomization is the leading strategy to close this gap by varying the simulation parameters such that real-world conditions fall within the distribution of the training data. Domain randomization has proven effective in both simulated benchmarks and real-world robotic tasks when the data diversity is sufficiently large [4, 5, 6]. A seemingly unrelated but conceptually similar approach to visual generalization in the age of foundation models has been to train large Figure 1: Behavior cloning with disentangled representations and associative latent dynamics achieves zero-shot generalization to various real world perturbations, such as changes in ambient lighting ( left), object color ( middle-left), directed lighting ( middle-right), and the presence of dis-tractor objects ( right).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found