Learning to Reason from Feedback at Test-Time
Li, Yanyang, Lyu, Michael, Wang, Liwei
–arXiv.org Artificial Intelligence
Solving complex tasks in a single attempt is challenging for large language models (LLMs). Iterative interaction with the environment and feedback is often required to achieve success, making effective feedback utilization a critical topic. Existing approaches either struggle with length generalization or rely on naive retries without leveraging prior information. In this paper, we introduce FTTT, a novel paradigm that formulates feedback utilization as an optimization problem at test time. Additionally, we propose a learnable test-time optimizer, OpTune, to effectively exploit feedback. Experiments on two LLMs across four reasoning datasets demonstrate that FTTT and OpTune achieve superior scalability and performance.
arXiv.org Artificial Intelligence
Feb-16-2025
- Country:
- Asia (0.93)
- Europe (1.00)
- North America > United States
- California (0.28)
- Genre:
- Research Report > New Finding (0.46)
- Technology: