Zero-Shot Visual Generalization in Robot Manipulation
Batra, Sumeet, Sukhatme, Gaurav
–arXiv.org Artificial Intelligence
A key requirement of any generalist robot system deployed in the real-world is the ability to perform tasks across visually diverse environments. High-dimensional inputs like RGB images offer rich information but also introduce complexity due to the curse of dimensionality. Given the enormous diversity of real-world visual data, accounting for every possible variation within a fixed dataset is intractable. Extracting the underlying structural knowledge of the world from visual data while being robust to semantically irrelevant visual perturbations remains an open question. The robot learning field has largely relied on one of several trends, one of which is to train agents in simulation, where visual complexity can be controlled and large-scale synthetic and diverse data can be generated efficiently through GPU-accelerated simulators [1, 2, 3]. However, transferring policies trained in simulation to the real world is hindered by the "Sim2Real" gap caused by mismatches in fidelity and unmodeled dynamics. Domain randomization is the leading strategy to close this gap by varying the simulation parameters such that real-world conditions fall within the distribution of the training data. Domain randomization has proven effective in both simulated benchmarks and real-world robotic tasks when the data diversity is sufficiently large [4, 5, 6]. A seemingly unrelated but conceptually similar approach to visual generalization in the age of foundation models has been to train large Figure 1: Behavior cloning with disentangled representations and associative latent dynamics achieves zero-shot generalization to various real world perturbations, such as changes in ambient lighting ( left), object color ( middle-left), directed lighting ( middle-right), and the presence of dis-tractor objects ( right).
arXiv.org Artificial Intelligence
May-20-2025
- Country:
- Oceania
- New Zealand > North Island
- Auckland Region > Auckland (0.04)
- Australia > New South Wales
- Sydney (0.04)
- New Zealand > North Island
- North America
- Canada (0.04)
- United States
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- California > Los Angeles County
- Long Beach (0.04)
- Arizona > Pima County
- Tucson (0.04)
- Louisiana > Orleans Parish
- Puerto Rico > San Juan
- San Juan (0.04)
- Europe
- Austria > Vienna (0.14)
- United Kingdom > England
- Greater London > London (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Netherlands > South Holland
- Delft (0.04)
- Germany > Bavaria
- Upper Bavaria > Munich (0.04)
- France > Hauts-de-France
- Asia
- Middle East > Jordan (0.04)
- Macao (0.04)
- Japan (0.04)
- China (0.04)
- South Korea > Daegu
- Daegu (0.04)
- Africa
- Rwanda > Kigali
- Kigali (0.04)
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Rwanda > Kigali
- Oceania
- Genre:
- Research Report (1.00)
- Technology: