Improving Large Language Model Planning with Action Sequence Similarity

Zhao, Xinran, Sedghi, Hanie, Bohnet, Bernd, Schuurmans, Dale, Nova, Azade

arXiv.org Artificial Intelligence 

Planning is essential for artificial intelligence systems to look ahead and proac-tively determine a course of actions to reach objectives in the virtual and real world. However, it remains unclear what signals in the context influence the model performance. In this work, we explore how to improve the model planning capability through in-context learning (ICL), specifically, what signals can help select the exemplars. Through extensive experiments, we observe that commonly used problem similarity may result in false positives with drastically different plans, which can mislead the model. In response, we propose to sample and filter exemplars leveraging plan side action sequence similarity (AS). We propose GRASE-DC: a two-stage pipeline that first re-samples high AS exemplars and then curates the selected exemplars with dynamic clustering on AS to achieve a balance of relevance and diversity. Our experimental result confirms that GRASE-DC achieves significant performance improvement on various planning tasks (up to ~11-40 point absolute accuracy improvement with 27.3% fewer exemplars needed on average). GRASE-DC can further boost the planning accuracy by ~24 absolute points on harder problems using simpler problems as exemplars over a random baseline. This demonstrates its ability to generalize to out-of-distribution problems. Planning is important for intelligent agents when exploring the environment and conducting complex multi-hop actions to achieve their goals strategically. Classical studies in planning mainly leverage search-based algorithms and reinforcement learning to tackle these problems. Recent advances in utilizing Large Language Models (LLMs) as the backbone of agents, e.g., for games (ToT, Y ao et al., 2023) and travel scheduling (Xie et al., 2024), call for the need to improve model planning ability to facilitate various downstream applications. Recent work achieves good performance on LLM planning with a combination of search-based algorithms and LLM decoding (Besta et al., 2024; Silver et al., 2024; Lehnert et al., 2024); however, multiple rounds of prompting in a tree structure, e.g., Monte-Carlo Tree Search (MCTS), can lead to high inference cost (Y ao et al., 2023). To further improve the effectiveness and efficiency, this paper focuses on improving the planning capability of LLMs with direct prompting in the in-context learning (ICL) (Brown et al., 2020) manner. We aim to seek signals that help select the good demonstrative task-plan examples in the context, i.e. exemplars (Rubin et al., 2022).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found