PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization
Lei, Kelun, Yang, Hailong, Zhang, Huaitao, You, Xin, Zhang, Kaige, Luan, Zhongzhi, Liu, Yi, Qian, Depei
–arXiv.org Artificial Intelligence
Abstract--Designing high-performance kernels requires expert-level tuning and a deep understanding of hardware characteristics. Recent advances in large language models (LLMs) have enabled automated kernel generation, yet most existing systems rely solely on correctness or execution time feedback, lacking the ability to reason about low-level performance bottlenecks. In this paper, we introduce PRAGMA, a profile-guided AI kernel generation framework that integrates execution feedback and fine-grained hardware profiling into the reasoning loop. PRAGMA enables LLMs to identify performance bottlenecks, preserve historical best versions, and iteratively refine code quality. Results show that PRAGMA consistently outperforms baseline N-PRAGMA without profiling enabled and achieves 2.81 and 2.30 averaged speedups against T orch on CPU and GPU platforms, respectively. Optimizing computational kernels is fundamental to achieving high performance in modern AI and HPC systems. Traditionally, reaching near-peak efficiency has required extensive manual tuning and deep expertise in architecture-specific optimization, making the development and maintenance of high-performance kernels both labor-intensive and error-prone.
arXiv.org Artificial Intelligence
Nov-25-2025