Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective

Wolf, Yotam, Rothberg, Binyamin, Shteyman, Dorin, Shashua, Amnon

Oct-3-2024–arXiv.org Artificial Intelligence

A common practice in large language model (LLM) usage for complex analytical tasks such as code generation, is to sample a solution for the entire task within the model's context window. Previous works have shown that subtask decomposition within the model's context (chain of thought), is beneficial for solving such tasks. In this work, we point a limitation of LLMs' ability to perform several sub-tasks within the same context window - an in-context hardness of composition, pointing to an advantage for distributing a decomposed problem in a multi-agent system of LLMs. The hardness of composition is quantified by a generation complexity metric, i.e., the number of LLM generations required to sample at least one correct solution. We find a gap between the generation complexity of solving a compositional problem within the same context relative to distributing it among multiple agents, that increases exponentially with the solution's length. We prove our results theoretically and demonstrate them empirically. Yet their analytical skills, such as coding capabilities, are slow to develop - Chen et al. (2021b); Li et al. (2022a); Alp (2023); Ridnik et al. (2024) show that even with millions of generations, LLMs may not produce a single correct solution to competitive coding problems.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-3-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning > Agents (1.00)