Experience Deploying Containerized GenAI Services at an HPC Center

Beltre, Angel M., Ogden, Jeff, Pedretti, Kevin

Sep-30-2025–arXiv.org Artificial Intelligence

Generative Artificial Intelligence (GenAI) applications are built from specialized components -- inference servers, object storage, vector and graph databases, and user interfaces -- interconnected via web-based APIs. While these components are often containerized and deployed in cloud environments, such capabilities are still emerging at High-Performance Computing (HPC) centers. In this paper, we share our experience deploying GenAI workloads within an established HPC center, discussing the integration of HPC and cloud computing environments. We describe our converged computing architecture that integrates HPC and Kubernetes platforms running containerized GenAI workloads, helping with reproducibility. A case study illustrates the deployment of the Llama Large Language Model (LLM) using a containerized inference server (vLLM) across both Kubernetes and HPC platforms using multiple container runtimes. Our experience highlights practical considerations and opportunities for the HPC container community, guiding future research and tool development.

large language model, machine learning, platform, (20 more...)

arXiv.org Artificial Intelligence

Sep-30-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > New Mexico (0.14)

Genre:
- Workflow (0.95)
- Research Report (0.64)

Industry:
- Information Technology > Services (0.93)
- Energy (0.68)
- Government > Regional Government
  - North America Government > United States Government (0.68)

Technology:
- Information Technology
  - Cloud Computing (1.00)
  - Artificial Intelligence
    - Natural Language
      - Large Language Model (1.00)
      - Chatbot (0.96)
    - Machine Learning > Neural Networks
      - Deep Learning > Generative AI (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found