REASONINGCOMPILER: LLM-Guided Optimizations for Efficient Model Serving
–Neural Information Processing Systems
While model serving has unlocked unprecedented capabilities, the high cost of serving large-scale models continues to be a significant barrier to widespread accessibility and rapid innovation. Compiler optimizations have long driven substantial performance improvements, but existing compilers struggle with neural workloads due to the exponentially large and highly interdependent space of possible transformations. Although existing stochastic search techniques can be effective, they are often sample-inefficient and fail to leverage the structural context underlying compilation decisions. We set out to investigate the research question of whether reasoning with large language models (LLMs), without any retraining, can leverage the context-aware decision space of compiler optimizations to significantly improve sample efficiency.
Neural Information Processing Systems
Jun-20-2026, 09:27:15 GMT
- Country:
- North America > United States > California (0.15)
- Genre:
- Research Report
- New Finding (0.48)
- Experimental Study (0.34)
- Research Report
- Technology: