operational resilience
Streamlining Resilient Kubernetes Autoscaling with Multi-Agent Systems via an Automated Online Design Framework
Soulé, Julien, Jamont, Jean-Paul, Occello, Michel, Traonouez, Louis-Marie, Théron, Paul
--In cloud-native systems, Kubernetes clusters with interdependent services often face challenges to their operational resilience due to poor workload management issues such as resource blocking, bottlenecks, or continuous pod crashes. These vulnerabilities are further amplified in adversarial scenarios, such as Distributed Denial-of-Service attacks (DDoS). Conventional Horizontal Pod Autoscaling (HPA) approaches struggle to address such dynamic conditions, while reinforcement learning-based methods, though more adaptable, typically optimize single goals like latency or resource usage, neglecting broader failure scenarios. We propose decomposing the overarching goal of maintaining operational resilience into failure-specific sub-goals delegated to collaborative agents, collectively forming an HPA Multi-Agent System (MAS). We introduce an automated, four-phase online framework for HPA MAS design: 1) modeling a digital twin built from cluster traces; 2) training agents in simulation using roles and missions tailored to failure contexts; 3) analyzing agent behaviors for explainability; and 4) transferring learned policies to the real cluster . Experimental results demonstrate that the generated HPA MASs outperform three state-of-the-art HPA systems in sustaining operational resilience under various adversarial conditions in a proposed complex cluster . Cloud-native critical systems are increasingly reliant on Kubernetes to orchestrate and manage interdependent services [1]. HP A is a widely adopted mechanism to dynamically adjust the number of pods based on resource usage, enabling systems to handle highly dynamic workloads [2]. However, failures such as pod crashes, resource contention, and bottlenecks can severely jeopardize the performance of all of the cluster's functionalities we globally refer to as operational resilience [3]. Worse, these failures may be exploited by attackers to degrade performance or induce outages, as seen in adversarial contexts like DDoS attacks [4].
- North America > United States (0.14)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
From Banks to Bananas: The Future of AI for IT Operations
The concept of artificial intelligence (AI) has far-reaching promise and applicability across the business and technology landscape. Thanks to the growth of big data, machine learning, analytics, and blazing computational speeds, the use of artificial intelligence has matured and is now playing a critical role in every major vertical, from banking to retail to logistics. This opens opportunities for organizations to combine the power of AI and mainframe to drive higher levels of operational resilience across their IT environment. To compete, enterprises need to ensure that their business processes can scale and perform flawlessly to delight their customers. Whether purchasing groceries, booking a flight, or trading stock, consumers expect 24/7 sub-second response.
- North America > United States > California > San Francisco County > San Francisco (0.05)
- North America > Panama (0.05)
- Asia > China > Hong Kong (0.05)
The Fintech Future: Accelerating the AI & ML Journey
Artificial intelligence (AI) has assumed a growing influence within financial services in recent years, affecting areas such as credit decisions, risk management, fraud detection, and stress testing. And for many fintechs, it has been baked into the process from the outset, to the extent that usage of AI in the fintech market registered $6 billion in 2019 and is expected to reach $22 billion by 2025. Economic fallout from the pandemic, however, has accelerated the timetable for financial services firms to become mass adopters of AI and harness its predictive powers sooner rather than later. For digitally native fintechs, many of which have already embraced AI and its capabilities, this offers the opportunity to invest further in the technology and capitalise on the tools available to accelerate their journeys. Fintechs across the world are dealing with the effects of Covid-19 and face an uphill challenge in containing the impact of it on the financial system and broader economy. With rising unemployment and stagnated economies, individuals and companies are struggling with debt, while the world in general is awash in credit risk.
- Banking & Finance > Financial Services (0.99)
- Banking & Finance > Economy (0.91)
- Health & Medicine > Therapeutic Area (0.64)
Council Post: Operational Resilience And Digital Transformation Hinge On Getting A Grasp On Data
Asheesh Mehra is the Co-Founder and Group CEO of AntWorks, a global leader in AI and robotics. This public health crisis has created a barrage of significant and often difficult-to-address challenges that individuals and organizations have never faced before. The global pandemic altered demand for products and services in every industry sector. In the process, it has exposed weaknesses in global supply chains and service networks. We're all living through this together, and I'm sure we can all agree it hasn't been easy.
How artificial intelligence can improve resilience in mineral processing during uncertain times
As COVID-19 continues to affect millions of lives and livelihoods, it is delivering perhaps the most significant shock to industries--from education to healthcare to food supply--in almost a century. Mineral processing companies also have to grapple with profound uncertainty and volatility. Before COVID-19, some were already taking steps to build their capabilities to cope with fluctuations inherent in commodities markets. But recent events triggering challenges in workforce availability, supply chains, and demand created a need for higher levels of operational resilience in a short period of time. Here is where recent advances in artificial intelligence (AI) helped.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.05)
- Africa > South Africa > Gauteng > Johannesburg (0.05)
- Materials > Metals & Mining (1.00)
- Banking & Finance > Trading (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.80)
- Health & Medicine > Therapeutic Area > Immunology (0.80)