Effective Large Language Model Debugging with Best-first Tree Search
Song, Jialin, Raiman, Jonathan, Catanzaro, Bryan
–arXiv.org Artificial Intelligence
However, their code-writing abilities are often limited in scope: while they can successfully implement simple functions, they struggle with more complex tasks. A fundamental difference with how an LLM writes code, compared to a human programmer, is that it cannot consistently spot and fix bugs. Debugging is a crucial skill for programmers and it enables iterative code refinement towards a correct implementation. In this work, we propose a novel algorithm to enable LLMs to debug their code via self-reflection and search where a model attempts to identify its previous mistakes. Our key contributions are 1) a best-first tree search algorithm with self-reflections (BESTER) that achieves state-of-the-art Pass@1 in three code generation benchmarks. BESTER maintains its superiority when we measure pass rates taking into account additional inference costs incurred by tree search.
arXiv.org Artificial Intelligence
Jul-26-2024
- Country:
- Asia > British Indian Ocean Territory
- Diego Garcia (0.04)
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > British Indian Ocean Territory
- Genre:
- Research Report (1.00)
- Technology: