KForge: Program Synthesis for Diverse AI Hardware Accelerators
Sereda, Taras, John, Tom St., Bartan, Burak, Serrino, Natalie, Katti, Sachin, Asgar, Zain
–arXiv.org Artificial Intelligence
GPU kernels are critical for ML performance but difficult to optimize across diverse accelerators. We present KForge, a platform-agnostic framework built on two collaborative LLM-based agents: a generation agent that produces and iteratively refines programs through compilation and correctness feedback, and a performance analysis agent that interprets profiling data to guide optimization. This agent-based architecture requires only a single-shot example to target new platforms. We make three key contributions: (1) introducing an iterative refinement system where the generation agent and performance analysis agent collaborate through functional and optimization passes, interpreting diverse profiling data (from programmatic APIs to GUI-based tools) to generate actionable recommendations that guide program synthesis for arbitrary accelerators; (2) demonstrating that the generation agent effectively leverages cross-platform knowledge transfer, where a reference implementation from one architecture substantially improves generation quality for different hardware targets; and (3) validating the platform-agnostic nature of our approach by demonstrating effective program synthesis across fundamentally different parallel computing platforms: NVIDIA CUDA and Apple Metal.
arXiv.org Artificial Intelligence
Nov-18-2025
- Country:
- North America > United States > California
- San Francisco County > San Francisco (0.14)
- Santa Clara County > Stanford (0.04)
- North America > United States > California
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Information Technology > Hardware (0.37)
- Technology: