Disentangling Exploration of Large Language Models by Optimal Exploitation
Grams, Tim, Betz, Patrick, Bartelt, Christian
–arXiv.org Artificial Intelligence
Exploration is a crucial skill for self-improvement and open-ended problemsolving. However, it remains uncertain whether large language models can effectively explore the state-space. Existing evaluations predominantly focus on the trade-off between exploration and exploitation, often assessed in multi-armed bandit problems. In contrast, this work isolates exploration as the sole objective, tasking the agent with delivering information that enhances future returns. For the evaluation, we propose to decompose missing rewards into exploration and exploitation components by measuring the optimal achievable return for the states already explored. Our experiments with various LLMs reveal that most models struggle to sufficiently explore the state-space and that weak exploration is insufficient. We observe a positive correlation between model size and exploration performance, with larger models demonstrating superior capabilities. Furthermore, we show that our decomposition provides insights into differences in behaviors driven by agent instructions during prompt engineering, offering a valuable tool for refining LLM performance in exploratory tasks. Recently, large language models (LLMs) have demonstrated promising results in various decision making tasks such as web browsing (Yao et al., 2022; Shinn et al., 2024; Ma et al., 2023), game-playing (Paglieri et al., 2024), and tasks in simulated households (Yao et al., 2022; Shinn et al., 2024). This way, LLMs act as agents that observe states and take actions in different environments. Through their vast internal knowledge-base and autoregressive in-context reasoning capabilities, the models are supposed to quickly adapt to new tasks. However, previous work has shown that LLMs struggle with solving increasingly complex environments due to several limitations: For example, the ability to learn from mistakes is often limited (Huang et al., 2023) and LLMs have difficulties with planning over long horizons (Kambhampati et al., 2024). The examples emphasize that understanding LLM abilities is essential for their risk assessment in real life applications, and future development.
arXiv.org Artificial Intelligence
Jan-15-2025
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (1.00)
- Research Report
- Industry:
- Education (0.67)
- Energy > Oil & Gas
- Upstream (0.69)
- Leisure & Entertainment (0.67)
- Technology: