KAIROS: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources

Li, Baolin, Samsi, Siddharth, Gadepally, Vijay, Tiwari, Devesh

May-2-2023–arXiv.org Artificial Intelligence

Online inference is becoming a key service product for many businesses, deployed in cloud platforms to meet customer demands. Despite their revenue-generation capability, these services need to operate under tight Quality-of-Service (QoS) and cost budget constraints. This paper introduces KAIROS, a novel runtime framework that maximizes the query throughput while meeting QoS target and a cost budget. KAIROS designs and implements novel techniques to build a pool of heterogeneous compute hardware without online exploration overhead, and distribute inference queries optimally at runtime. Our evaluation using industry-grade deep learning (DL) models shows that KAIROS yields up to 2X the throughput of an optimal homogeneous solution, and outperforms state-of-the-art schemes by up to 70%, despite advantageous implementations of the competing schemes to ignore their exploration overhead.

artificial intelligence, cloud computing, machine learning, (20 more...)

arXiv.org Artificial Intelligence

May-2-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.46)
- Asia > Middle East
  - Israel (0.04)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Information Technology > Services (1.00)
- Government (0.93)

Technology:
- Information Technology
  - Communications (1.00)
  - Cloud Computing (1.00)
  - Artificial Intelligence
    - Representation & Reasoning
      - Optimization (0.93)
      - Personal Assistant Systems (0.68)
    - Machine Learning > Neural Networks
      - Deep Learning (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found