ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering

Imajuku, Yuki, Horie, Kohki, Iwata, Yoichi, Aoki, Kensho, Takahashi, Naohiro, Akiba, Takuya

Oct-7-2025–arXiv.org Artificial Intelligence

How well do AI systems perform in algorithm engineering for hard optimization problems in domains such as package-delivery routing, crew scheduling, factory production planning, and power-grid balancing? We introduce ALE-Bench, a new benchmark for evaluating AI systems on score-based algorithmic programming contests. Drawing on real tasks from the AtCoder Heuristic Contests, ALE-Bench presents optimization problems that are computationally hard and admit no known exact solution. Unlike short-duration, pass/fail coding benchmarks, ALE-Bench encourages iterative solution refinement over long time horizons. Our software framework supports interactive agent architectures that leverage test-run feedback and visualizations. Our evaluation of frontier LLMs revealed that while they demonstrate high performance on specific problems, a notable gap remains compared to humans in terms of consistency across problems and long-horizon problem-solving capabilities. This highlights the need for this benchmark to foster future AI advancements.

large language model, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

Oct-7-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Japan (0.28)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Information Technology > Services (0.45)
- Transportation > Freight & Logistics Services (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning
    - Search (1.00)
    - Agents (1.00)
    - Optimization (0.86)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.99)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)