KernelBench: Can LLMs Write Efficient GPU Kernels?
Ouyang, Anne, Guo, Simon, Arora, Simran, Zhang, Alex L., Hu, William, Ré, Christopher, Mirhoseini, Azalia
–arXiv.org Artificial Intelligence
Efficient GPU kernels are crucial for building performant machine learning architectures, but writing them is a time-consuming challenge that requires significant expertise; therefore, we explore using language models (LMs) to automate kernel generation. We introduce KernelBench, an open-source framework for evaluating LMs' ability to write fast and correct kernels on a suite of 250 carefully selected PyTorch ML workloads. KernelBench represents a real-world engineering environment and making progress on the introduced benchmark directly translates to faster practical kernels. We introduce a new evaluation metric fast_p, which measures the percentage of generated kernels that are functionally correct and offer a speedup greater than an adjustable threshold p over baseline. Our experiments across various state-of-the-art models and test-time methods show that frontier reasoning models perform the best out of the box but still fall short overall, matching the PyTorch baseline in less than 20% of the cases. While we show that results can improve by leveraging execution and profiling feedback during iterative refinement, KernelBench remains a challenging benchmark, with its difficulty increasing as we raise speedup threshold p.
arXiv.org Artificial Intelligence
Feb-14-2025
- Country:
- North America > United States (0.28)
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Energy (0.46)
- Information Technology (0.69)
- Technology: