Elnikety, Sameh
Analytically-Driven Resource Management for Cloud-Native Microservices
Zhang, Yanqi, Zhou, Zhuangzhuang, Elnikety, Sameh, Delimitrou, Christina
Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such as autoscaling, in terms of both SLA maintenance and resource efficiency. However, ML-driven approaches also face challenges including lengthy data collection processes and limited scalability. We present Ursa, a lightweight resource management system for cloud-native microservices that addresses these challenges. Ursa uses an analytical model that decomposes the end-to-end SLA into per-service SLA, and maps per-service SLA to individual resource allocations per microservice tier. To speed up the exploration process and avoid prolonged SLA violations, Ursa explores each microservice individually, and swiftly stops exploration if latency exceeds its SLA. We evaluate Ursa on a set of representative and end-to-end microservice topologies, including a social network, media service and video processing pipeline, each consisting of multiple classes and priorities of requests with different SLAs, and compare it against two representative ML-driven systems, Sinan and Firm. Compared to these ML-driven approaches, Ursa provides significant advantages: It shortens the data collection process by more than 128x, and its control plane is 43x faster than ML-driven approaches. At the same time, Ursa does not sacrifice resource efficiency or SLAs. During online deployment, Ursa reduces the SLA violation rate by 9.0% up to 49.9%, and reduces CPU allocation by up to 86.2% compared to ML-driven approaches.
Position Paper: Embracing Heterogeneity—Improving Energy Efficiency for Interactive Services on Heterogeneous Data Center Hardware
He, Yuxiong (Microsoft Research) | Elnikety, Sameh (Microsoft Research)
Data centers today are heterogeneous: they have servers from multiple generations and multiple vendors; server machines have multiple cores that are capable of running at difference speeds, and some have general purpose graphics processing units (GPGPU). Hardware trends indicate that future processors will have heterogeneous cores with different speeds and capabilities. This environment enables new advances in power saving and application optimization. It also poses new challenges, as current systems software is ill-suited for heterogeneity. In this position paper, we focus on interactive applications and outline some of the techniques to embrace heterogeneity. We show that heterogeneity can be exploited to deliver interactive services in an energy-efficient manner. For example, our initial study suggests that neither high-end nor low-end servers alone are very effective in servicing a realistic workload, which typically has requests with varying service demands. High-end servers achieve good throughput but the energy costs are high. Low-end servers are energy-efficient for short requests, but they may not be able to serve long requests at the desired quality of service. In this work, we show that a heterogeneous system can be a better choice than an equivalent homogeneous system to deliver interactive services in a cost-effective manner, transforming heterogeneity from a resource management nightmare to an asset. We highlight some of the challenges and opportunities and the role of AI and machine learning techniques for hosting large interactive services in data centers.