Federated Learning Framework for Scalable AI in Heterogeneous HPC and Cloud Environments
Ghimire, Sangam, Timalsina, Paribartan, Bhurtel, Nirjal, Neupane, Bishal, Shrestha, Bigyan Byanju, Bhattarai, Subarna, Gaire, Prajwal, Thapa, Jessica, Jha, Sudan
–arXiv.org Artificial Intelligence
As AI models continue to grow in complexity and size, so does the demand for vast computational resources and access to large-scale distributed datasets. At the same time, growing concerns about data privacy, ownership, and regulatory compliance make it increasingly difficult to centralize data for training. FL has emerged as a promising paradigm for addressing these challenges, enabling the training of collaborative models across multiple data silos without requiring the raw data to leave its source. While FL has gained traction in mobile and edge environments, such as smart-phones and IoT devices, its application in large-scale computing platforms like HPC clusters and cloud infrastructure remains underexplored. Meanwhile, the convergence of HPC and cloud computing is reshaping the landscape of modern data-intensive applications. These hybrid environments combine the raw power and efficiency of HPC with the scalability and flexibility of the cloud, making them well-suited for training large AI models. However, this integration brings new challenges: heterogeneous hardware (e.g., Central Processing Units (CPUs), Graphics Processing Units (GPUs), Tensor Processing Units (TPUs)), inconsistent network performance, dynamic resource availability, and non-uniform data distributions across clients. In this context, the deployment of federated learning across such mixed infrastructure is both a timely opportunity and a technical challenge. This paper explores how FL can be adapted and optimized to run efficiently across heterogeneous HPC and cloud environments, with a focus on scalability, system resilience, and performance under non-IID data conditions.
arXiv.org Artificial Intelligence
Nov-26-2025
- Country:
- Africa > Sudan (0.04)
- Asia > Nepal
- Bagmati Province > Kathmandu District > Kathmandu (0.04)
- Europe > Spain
- North America > United States
- New York > New York County > New York City (0.04)
- South America > Brazil
- Rio Grande do Sul > Porto Alegre (0.04)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Information Technology
- Security & Privacy (1.00)
- Services (1.00)
- Information Technology
- Technology:
- Information Technology
- Artificial Intelligence > Machine Learning (1.00)
- Cloud Computing (1.00)
- Communications (1.00)
- Information Technology