Open-vocabulary Queryable Scene Representations for Real World Planning
Chen, Boyuan, Xia, Fei, Ichter, Brian, Rao, Kanishka, Gopalakrishnan, Keerthana, Ryoo, Michael S., Stone, Austin, Kappler, Daniel
–arXiv.org Artificial Intelligence
Abstract-- Large language models (LLMs) have unlocked new capabilities of task planning from human instructions. NLMap first establishes a natural language queryable scene representation with Visual Language models (VLMs). An LLM based object proposal module parses instructions and proposes involved objects to query the scene representation for object availability and location. An LLM planner then plans with such information about the scene. We propose an open-vocabulary and queryable scene representation for real-world planning. The returned object presence and location are used for LLM-based planning. It has to first identify relevant objects and upon it. Recent progress in large language models (LLMs), locations within the scene (e.g., the watering can, the sink, and has shown impressive few-shot performance in language each potential plant) and then plan over these objects in sequential comprehension, semantic understanding, and reasoning, as order (get the watering can, then go the sink, and then fill it well as application to robotics problems like planning [5]-[7] up), conditioning on its affordances (e.g., can it carry a full and instruction following [8]. Using such models in embodied watering can), and conditioning on the scene (e.g., how many settings can provide significant challenges, most critically because plants there are, and where are they).
arXiv.org Artificial Intelligence
Oct-15-2022
- Country:
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Genre:
- Research Report (0.82)
- Industry:
- Technology: