Visual language maps for robot navigation – Google AI Blog
People are excellent navigators of the physical world, due in part to their remarkable ability to build cognitive maps that form the basis of spatial memory -- from localizing landmarks at varying ontological levels (like a book on a shelf in the living room) to determining whether a layout permits navigation from point A to point B. Building robots that are proficient at navigation requires an interconnected understanding of (a) vision and natural language (to associate landmarks or follow instructions), and (b) spatial reasoning (to connect a map representing an environment to the true spatial distribution of objects). While there have been many recent advances in training joint visual-language models on Internet-scale data, figuring out how to best connect them to a spatial representation of the physical world that can be used by robots remains an open research question. To explore this, we collaborated with researchers at the University of Freiburg and Nuremberg to develop Visual Language Maps (VLMaps), a map representation that directly fuses pre-trained visual-language embeddings into a 3D reconstruction of the environment. VLMaps, which is set to appear at ICRA 2023, is a simple approach that allows robots to (1) index visual landmarks in the map using natural language descriptions, (2) employ Code as Policies to navigate to spatial goals, such as "go in between the sofa and TV" or "move three meters to the right of the chair", and (3) generate open-vocabulary obstacle maps -- allowing multiple robots with different morphologies (mobile manipulators vs. drones, for example) to use the same VLMap for path planning. VLMaps can be used out-of-the-box without additional labeled data or model fine-tuning, and outperforms other zero-shot methods by over 17% on challenging object-goal and spatial-goal navigation tasks in Habitat and Matterport3D.
Mar-23-2023, 19:09:50 GMT
- Country:
- Europe > Germany
- Baden-Württemberg > Freiburg (0.25)
- Bavaria > Middle Franconia
- Nuremberg (0.25)
- Europe > Germany
- Genre:
- Research Report (0.77)
- Technology:
- Information Technology > Artificial Intelligence
- Robots (1.00)
- Natural Language (1.00)
- Representation & Reasoning > Spatial Reasoning (0.56)
- Games > Go (0.40)
- Information Technology > Artificial Intelligence