STRATUS: AMulti-agent System for Autonomous Reliability Engineering of Modern Clouds
–Neural Information Processing Systems
In cloud-scale systems, failures are the norm. A distributed computing cluster exhibits hundreds of machine failures and thousands of disk failures; software bugs and misconfigurations are reported to be more frequent. The demand for autonomous, AI-driven reliability engineering continues to grow, as existing humanin-the-loop practices can hardly keep up with the scale of modern clouds. This paper presents STRATUS, an LLM-based multi-agent system for realizing autonomous Site Reliability Engineering (SRE) of cloud services.
Neural Information Processing Systems
Jun-16-2026, 23:55:25 GMT
- Country:
- North America > United States (0.28)
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Information Technology > Services (1.00)
- Technology: