kubernetes
Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM
Trappen, Tim, Keßler, Robert, Pabel, Roland, Achter, Viktor, Wesner, Stefan
Due to rising demands for Artificial Inteligence (AI) inference, especially in higher education, novel solutions utilising existing infrastructure are emerging. The utilisation of High-Performance Computing (HPC) has become a prevalent approach for the implementation of such solutions. However, the classical operating model of HPC does not adapt well to the requirements of synchronous, user-facing dynamic AI application workloads. In this paper, we propose our solution that serves LLMs by integrating vLLM, Slurm and Kubernetes on the supercomputer \textit{RAMSES}. The initial benchmark indicates that the proposed architecture scales efficiently for 100, 500 and 1000 concurrent requests, incurring only an overhead of approximately 500 ms in terms of end-to-end latency.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Tennessee > Davidson County > Nashville (0.05)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Cologne (0.05)
- (7 more...)
- Information Technology (0.95)
- Education > Educational Setting (0.50)
A CODECO Case Study and Initial Validation for Edge Orchestration of Autonomous Mobile Robots
Zhu, H., Samizadeh, T., Sofia, R. C.
Hongyu Zhu, Tina Samizadeh, Rute C. Sofia fortiss - research Institute of the Free State of Bavaria associated with the Technical University of Munich (TUM) Abstract--Autonomous Mobile Robots (AMRs) increasingly adopt containerized micro-services across the Edge-Cloud continuum. While Kubernetes is the de-facto orchestrator for such systems, its assumptions--stable networks, homogeneous resources, and ample compute capacity do not fully hold in mobile, resource-constrained robotic environments. The paper describes a case-study on smart-manufacturing AMR and performs an initial comparison between CODECO orchestration and standard Kubernetes using a controlled Kubernetes-in-Docker (KinD) environment. Metrics include pod deployment and deletion times, CPU and memory usage, and inter-pod data rates. The observed results indicate that CODECO offers reduced CPU consumption and more stable communication patterns, at the cost of modest memory overhead ( 10-15%) and slightly increased pod lifecycle latency due to secure overlay initialization. Kubernetes provides declarative configuration, automated scaling, and robust availability mechanisms that make it highly effective in cloud data-centers. However, its design assumptions, namely, the existence of relatively stable networks, abundant compute resources, and largely static infrastructure, do not fully hold in Edge-Edge and Edge-Cloud environments. In such settings, resources can be constrained and heterogeneous.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.24)
- North America > United States (0.04)
- Information Technology > Cloud Computing (1.00)
- Information Technology > Artificial Intelligence > Robots > Locomotion (0.61)
Application Management in C-ITS: Orchestrating Demand-Driven Deployments and Reconfigurations
Zanger, Lukas, Lampe, Bastian, Reiher, Lennart, Eckstein, Lutz
Personal use of this material is permitted. Abstract-- V ehicles are becoming increasingly automated and interconnected, enabling the formation of cooperative intelligent transport systems (C-ITS) and the use of offboard services. As a result, cloud-native techniques, such as microservices and container orchestration, play an increasingly important role in their operation. However, orchestrating applications in a large-scale C-ITS poses unique challenges due to the dynamic nature of the environment and the need for efficient resource utilization. In this paper, we present a demand-driven application management approach that leverages cloud-native techniques - specifically Kubernetes - to address these challenges. T aking into account the demands originating from different entities within the C-ITS, the approach enables the automation of processes, such as deployment, reconfiguration, update, upgrade, and scaling of microservices. Executing these processes on demand can, for example, reduce computing resource consumption and network traffic. A demand may include a request for provisioning an external supporting service, such as a collective environment model. The approach handles changing and new demands by dynamically reconciling them through our proposed application management framework built on Kubernetes and the Robot Operating System (ROS 2). We demonstrate the operation of our framework in the C-ITS use case of collective environment perception and make the source code of the prototypical framework publicly available at https://github.com/
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
- Europe > Hungary > Budapest > Budapest (0.04)
- (3 more...)
- Information Technology > Cloud Computing (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
Scaling Homomorphic Applications in Deployment
Marinelli, Ryan, Chowdhury, Angelica
In this endeavor, a proof-of-concept homomorphic application is developed to determine the production readiness of encryption ecosystems. A movie recommendation app is implemented for this purpose and productionized through containerization and orchestration. By tuning deployment configurations, the computational limitations of Fully Homomorphic Encryption (FHE) are mitigated through additional infrastructure optimizations.
AI Factories: It's time to rethink the Cloud-HPC divide
Lopez, Pedro Garcia, Pons, Daniel Barcelona, Copik, Marcin, Hoefler, Torsten, Quiñones, Eduardo, Malawski, Maciej, Pietzutch, Peter, Marti, Alberto, Timoudas, Thomas Ohlson, Slominski, Aleksander
The strategic importance of artificial intelligence is driving a global push toward Sovereign AI initiatives. Nationwide governments are increasingly developing dedicated infrastructures, called AI Factories (AIF), to achieve technological autonomy and secure the resources necessary to sustain robust local digital ecosystems. In Europe, the EuroHPC Joint Undertaking is investing hundreds of millions of euros into several AI Factories, built atop existing high-performance computing (HPC) supercomputers. However, while HPC systems excel in raw performance, they are not inherently designed for usability, accessibility, or serving as public-facing platforms for AI services such as inference or agentic applications. In contrast, AI practitioners are accustomed to cloud-native technologies like Kubernetes and object storage, tools that are often difficult to integrate within traditional HPC environments. This article advocates for a dual-stack approach within supercomputers: integrating both HPC and cloud-native technologies. Our goal is to bridge the divide between HPC and cloud computing by combining high performance and hardware acceleration with ease of use and service-oriented front-ends. This convergence allows each paradigm to amplify the other. To this end, we will study the cloud challenges of HPC (Serverless HPC) and the HPC challenges of cloud technologies (High-performance Cloud).
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Sweden (0.04)
- (23 more...)
Automated Creation and Enrichment Framework for Improved Invocation of Enterprise APIs as Tools
Agarwal, Prerna, Gupta, Himanshu, Soni, Soujanya, Vallam, Rohith, Sindhgatta, Renuka, Mehta, Sameep
Recent advancements in Large Language Models (LLMs) has lead to the development of agents capable of complex reasoning and interaction with external tools. In enterprise contexts, the effective use of such tools that are often enabled by application programming interfaces (APIs), is hindered by poor documentation, complex input or output schema, and large number of operations. These challenges make tool selection difficult and reduce the accuracy of payload formation by up to 25%. We propose ACE, an automated tool creation and enrichment framework that transforms enterprise APIs into LLM-compatible tools. ACE, (i) generates enriched tool specifications with parameter descriptions and examples to improve selection and invocation accuracy, and (ii) incorporates a dynamic shortlisting mechanism that filters relevant tools at runtime, reducing prompt complexity while maintaining scalability. We validate our framework on both proprietary and open-source APIs and demonstrate its integration with agentic frameworks. To the best of our knowledge, ACE is the first end-to-end framework that automates the creation, enrichment, and dynamic selection of enterprise API tools for LLM agents.
- North America > United States (0.04)
- Europe > Norway > Norwegian Sea (0.04)
- Asia > Singapore (0.04)
- (2 more...)
MAIA: A Collaborative Medical AI Platform for Integrated Healthcare Innovation
Bendazzoli, Simone, Persson, Sanna, Astaraki, Mehdi, Pettersson, Sebastian, Grozman, Vitali, Moreno, Rodrigo
Artificial Intelligence (AI) integration in healthcare has emerged as a transfor-mative force, promising to revolutionize patient care, optimize resource allocation, and enhance clinical decision-making [2, 10]. As the healthcare ecosystem increasingly recognizes the importance of AI-powered tools, there is a growing need for collaborative platforms to facilitate the development, deployment, and management of AI solutions in medical settings [7, 13]. Modern healthcare institutions are facing complex challenges that demand sophisticated technological solutions. A comprehensive Medical AI Platform can serve as a powerful foundation for addressing these complex needs, effectively bridging technological capabilities with clinical requirements. One of the open challenges in healthcare is the management of the vast amounts of data handled in clinical settings. Cloud-based medical AI platforms can provide new opportunities for computational resource sharing, enabling institutions to optimize data storage, and collaborative research environments. By creating a unified and standardised ecosystem, these platforms break down traditional institutional barriers, facilitating knowledge exchange between medical professionals, data scientists, and researchers.
- Workflow (1.00)
- Research Report > Experimental Study (0.68)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Health Care Providers & Services (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Simplifying Root Cause Analysis in Kubernetes with StateGraph and LLM
Xiang, Yong, Chen, Charley Peter, Zeng, Liyi, Yin, Wei, Liu, Xin, Li, Hu, Xu, Wei
Kubernetes, a notably complex and distributed system, utilizes an array of controllers to uphold cluster management logic through state reconciliation. Nevertheless, maintaining state consistency presents significant challenges due to unexpected failures, network disruptions, and asynchronous issues, especially within dynamic cloud environments. These challenges result in operational disruptions and economic losses, underscoring the necessity for robust root cause analysis (RCA) to enhance Kubernetes reliability. The development of large language models (LLMs) presents a promising direction for RCA. However, existing methodologies encounter several obstacles, including the diverse and evolving nature of Kubernetes incidents, the intricate context of incidents, and the polymorphic nature of these incidents. In this paper, we introduce SynergyRCA, an innovative tool that leverages LLMs with retrieval augmentation from graph databases and enhancement with expert prompts. SynergyRCA constructs a StateGraph to capture spatial and temporal relationships and utilizes a MetaGraph to outline entity connections. Upon the occurrence of an incident, an LLM predicts the most pertinent resource, and SynergyRCA queries the MetaGraph and StateGraph to deliver context-specific insights for RCA. We evaluate SynergyRCA using datasets from two production Kubernetes clusters, highlighting its capacity to identify numerous root causes, including novel ones, with high efficiency and precision. SynergyRCA demonstrates the ability to identify root causes in an average time of about two minutes and achieves an impressive precision of approximately 0.90.
- Asia > China > Beijing > Beijing (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Taming the Memory Beast: Strategies for Reliable ML Training on Kubernetes
Kubernetes offers a powerful orchestration platform for machine learning training, but memory management can be challenging due to specialized needs and resource constraints. This paper outlines how Kubernetes handles memory requests, limits, Quality of Service classes, and eviction policies for ML workloads, with special focus on GPU memory and ephemeral storage. Common pitfalls such as overcommitment, memory leaks, and ephemeral volume exhaustion are examined. We then provide best practices for stable, scalable memory utilization to help ML practitioners prevent out-of-memory events and ensure high-performance ML training pipelines.
A generic approach for reactive stateful mitigation of application failures in distributed robotics systems deployed with Kubernetes
Mirus, Florian, Pasch, Frederik, Singhal, Nikhil, Scholl, Kay-Ulrich
Offloading computationally expensive algorithms to the edge or even cloud offers an attractive option to tackle limitations regarding on-board computational and energy resources of robotic systems. In cloud-native applications deployed with the container management system Kubernetes (K8s), one key problem is ensuring resilience against various types of failures. However, complex robotic systems interacting with the physical world pose a very specific set of challenges and requirements that are not yet covered by failure mitigation approaches from the cloud-native domain. In this paper, we therefore propose a novel approach for robotic system monitoring and stateful, reactive failure mitigation for distributed robotic systems deployed using Kubernetes (K8s) and the Robot Operating System (ROS2). By employing the generic substrate of Behaviour Trees, our approach can be applied to any robotic workload and supports arbitrarily complex monitoring and failure mitigation strategies. We demonstrate the effectiveness and application-agnosticism of our approach on two example applications, namely Autonomous Mobile Robot (AMR) navigation and robotic manipulation in a simulated environment.
- North America > United States (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Research Report > Promising Solution (0.34)
- Overview > Innovation (0.34)
- Information Technology > Cloud Computing (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)