Accelerating Latency-Critical Applications with AI-Powered Semi-Automatic Fine-Grained Parallelization on SMT Processors
–arXiv.org Artificial Intelligence
Latency-critical applications tend to show low utilization of functional units due to frequent cache misses and mispredictions during speculative execution in high-performance superscalar processors. However, due to significant impact on single-thread performance, Simultaneous Multithreading (SMT) technology is rarely used with heavy threads of latency-critical applications. In this paper, we explore utilization of SMT technology to support fine-grained parallelization of latency-critical applications. Following the advancements in the development of Large Language Models (LLMs), we introduce Aira, an AI-powered Parallelization Adviser. To implement Aira, we extend AI Coding Agent in Cursor IDE with additional tools connected through Model Context Protocol, enabling end-to-end AI Agent for parallelization. Additional connected tools enable LLM-guided hotspot detection, collection of dynamic dependencies with Dynamic Binary Instrumentation, SMT-aware performance simulation to estimate performance gains. We apply Aira with Relic parallel framework for fine-grained task parallelism on SMT cores to parallelize latency-critical benchmarks representing real-world applications used in industry. We show 17% geomean performance gain from parallelization of latency-critical benchmarks using Aira with Relic framework.
arXiv.org Artificial Intelligence
Sep-3-2025
- Country:
- Asia
- Russia (0.04)
- Singapore > Central Region
- Singapore (0.04)
- Europe
- Italy (0.04)
- Portugal (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Sweden (0.04)
- North America > United States
- Arizona > Maricopa County
- Phoenix (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Arizona > Maricopa County
- Asia
- Genre:
- Research Report (0.40)
- Technology: