Serving LLMs in HPC Clusters: A Comparative Study of Qualcomm Cloud AI 100 Ultra and NVIDIA Data Center GPUs

Sada, Mohammad Firas, Graham, John J., Khoda, Elham E, Tatineni, Mahidhar, Mishin, Dmitry, Gupta, Rajesh K., Wagner, Rick, Smarr, Larry, DeFanti, Thomas A., Würthwein, Frank

Oct-30-2025–arXiv.org Artificial Intelligence

The rapid proliferation of large language models (LLMs) has fundamentally transformed scientific computing, enabling breakthroughs across domains from computational biology to materials science. As these models scale to hundreds of billions of parameters, high-performance computing (HPC) facilities face mounting challenges in providing sustainable, cost-effective inference capabilities to diverse research communities. Traditional GPU-centric approaches, while delivering exceptional throughput, present significant barriers in terms of power consumption, cooling requirements, and capital investment, particularly problematic for shared research cyberinfrastructures serving hundreds of concurrent users. The National Research Platform (NRP) exemplifies these challenges and opportunities. As a federated Kubernetes-based infrastructure supporting over 300 research groups across over 100 sites, the NRP must balance competing demands: delivering high-performance AI capabilities while managing constrained power budgets, enabling fine-grained resource allocation for multi-tenant workloads, and providing cost-effective access to emerging AI models for diverse scientific applications [1, 2]. 1

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-30-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > San Diego County (0.15)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Information Technology
  - Services (0.50)
  - Hardware (0.43)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.99)