Meta-Prompt Optimization for LLM-Based Sequential Decision Making
Kong, Mingze, Wang, Zhiyong, Shu, Yao, Dai, Zhongxiang
–arXiv.org Artificial Intelligence
Large language models (LLMs) have recently been employed as agents to solve sequential decision-making tasks such as Bayesian optimization and multi-armed bandits (MAB). These works usually adopt an LLM for sequential action selection by providing it with a fixed, manually designed meta-prompt. However, numerous previous works have found that the prompt has a significant impact on the performance of the LLM, which calls for a method to automatically optimize the meta-prompt for LLM-based agents. Unfortunately, the non-stationarity in the reward observations during LLM-based sequential decision-making makes meta-prompt optimization highly challenging. To address this challenge, we draw inspirations from adversarial bandit algorithms, which are inherently capable of handling non-stationary reward observations. Building on this foundation, we propose our EXPonential-weight algorithm for prompt Optimization} (EXPO) to automatically optimize the task description and meta-instruction in the meta-prompt for LLM-based agents. We also extend EXPO to additionally optimize the exemplars (i.e., history of interactions) in the meta-prompt to further enhance the performance, hence introducing our EXPO-ES algorithm. We use extensive experiments to show that our algorithms significantly improve the performance of LLM-based sequential decision-making.
arXiv.org Artificial Intelligence
Feb-2-2025
- Country:
- Asia > China
- Guangdong Province > Shenzhen (0.04)
- Hong Kong (0.04)
- Asia > China
- Genre:
- Research Report > New Finding (0.46)
- Technology: