PaLM-E: An Embodied Multimodal Language Model

Driess, Danny, Xia, Fei, Sajjadi, Mehdi S. M., Lynch, Corey, Chowdhery, Aakanksha, Ichter, Brian, Wahid, Ayzaan, Tompson, Jonathan, Vuong, Quan, Yu, Tianhe, Huang, Wenlong, Chebotar, Yevgen, Sermanet, Pierre, Duckworth, Daniel, Levine, Sergey, Vanhoucke, Vincent, Hausman, Karol, Toussaint, Marc, Greff, Klaus, Zeng, Andy, Mordatch, Igor, Florence, Pete

arXiv.org Artificial Intelligence 

Large language models (LLMs) demonstrate strong reasoning Large language models have been demonstrated to perform capabilities across various domains, including dialogue complex tasks. However, enabling general inference in the (Glaese et al., 2022; Thoppilan et al., 2022), step-by-step real world, e.g. for robotics problems, raises the challenge reasoning (Wei et al., 2022; Kojima et al., 2022), math problem of grounding. We propose embodied language models to directly solving (Lewkowycz et al., 2022; Polu et al., 2022), and incorporate real-world continuous sensor modalities code writing (Chen et al., 2021a). However, a limitation of into language models and thereby establish the link between such models for inference in the real world is the issue of words and percepts. Input to our embodied language grounding: while training LLMs on massive textual data model are multi-modal sentences that interleave visual, continuous may lead to representations that relate to our physical world, state estimation, and textual input encodings. We connecting those representations to real-world visual and train these encodings end-to-end, in conjunction with a pretrained physical sensor modalities is essential to solving a wider large language model, for multiple embodied tasks range of grounded real-world problems in computer vision including sequential robotic manipulation planning, visual and robotics (Tellex et al., 2020).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found