AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

Zhang, Genghan, Zhu, Shaowei, Wei, Anjiang, Song, Zhenyu, Nie, Allen, Jia, Zhen, Vijaykumar, Nandita, Wang, Yida, Olukotun, Kunle

Nov-21-2025–arXiv.org Artificial Intelligence

We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an optimization memory that curates experiences and insights from previously encountered slow-fast kernel pairs. We build NKIBench, a new benchmark suite of AWS Trainium accelerator kernels with varying complexity extracted from real-world LLM workloads to evaluate the effectiveness of AccelOpt. Our evaluation confirms that AccelOpt's capability improves over time, boosting the average percentage of peak throughput from $49\%$ to $61\%$ on Trainium 1 and from $45\%$ to $59\%$ on Trainium 2 for NKIBench kernels. Moreover, AccelOpt is highly cost-effective: using open-source models, it matches the kernel improvements of Claude Sonnet 4 while being $26\times$ cheaper.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

Nov-21-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Ontario
    - Toronto (0.14)
  - United States > California
    - Santa Clara County > Palo Alto (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found