Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling

Li, Jinghan, Sun, Zhicheng, Li, Fei, Sheng, Cao, Yu, Jiazhong, Mu, Yadong

arXiv.org Artificial Intelligence 

In the endeavor to make autonomous robots take actions, task planning is a major challenge that requires translating high-level task descriptions into long-horizon action sequences. Despite recent advances in language model agents, they remain prone to planning errors and limited in their ability to plan ahead. To address these limitations in robotic planning, we advocate a self-refining scheme that iteratively refines a draft plan until an equilibrium is reached. Remarkably, this process can be optimized end-to-end from an analytical perspective without the need to curate additional verifiers or reward models, allowing us to train self-refining planners in a simple supervised learning fashion. Meanwhile, a nested equilibrium sequence modeling procedure is devised for efficient closed-loop planning that incorporates useful feedback from the environment (or an internal world model). Our method is evaluated on the VirtualHome-Env benchmark, showing advanced performance with better scaling for inference computation. Based on their extensive world knowledge, LLM agents seem close to autonomously performing robotic tasks, such as in household scenarios. However, growing evidence shows that existing LLM agents struggle with task planning (Kaelbling & Lozano-Pérez, 2011) that decomposes a high-level task into mid-level actions. While this problem requires long-horizon planning as well as consideration of environmental feedback, LLMs are often limited by: (1) unidirectional dependency: due to autoregressive generation, previous tokens cannot attend to future tokens, resulting in limited ability to plan ahead (Wu et al., 2024a); (2) lack of error correction for existing outputs, unless with a heavy system 2; (3) fixed forward process hindering the allocation of more inference computation to further improve planning performance. These inherent limitations of LLMs lead to inefficiency in the closed-loop long-horizon robotic planning. To address above challenges of LLM planners in closed-loop long-horizon planning, we advocate the approach of self-refinement (Welleck et al., 2023; Shinn et al., 2023; Kim et al., 2023; Madaan et al., 2023) that iteratively improves a previously generated plan. The reasons behind are threefold: (1) bidirectional dependency: since the output is conditioned on a previous draft plan, it can attend to all tokens in the plan (from an old version), thus improving its ability to plan ahead; (2) internal error correction which allows implicit self-correction in a forward pass without an explicit, heavy system 2; (3) dynamic computation allocation by iterating through a self-refinement process until convergence.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found