Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows

Yang, Yuting, Merlina, Andrea, Song, Weijia, Yuan, Tiancheng, Birman, Ken, Vitenberg, Roman

arXiv.org Artificial Intelligence 

Yet We consider ML query processing in distributed systems intelligent edge applications differ from cloud microservices where GPU-enabled workers coordinate to execute complex in important ways, so we cannot just use the same techniques queries: a computing style often seen in applications that interact employed in web frameworks. Whereas the outer tiers of with users in support of image processing and natural today's cloud are dominated by lightweight, stateless, containerized language processing. In such systems, coscheduling of GPU applications that can be upscaled or downscaled memory management and task placement represents a promising at low cost, ML depends on large objects (hyperparameters, opportunity. We propose Compass, a novel framework model parameters, and supporting databases) and often entails that unifies these functions to reduce job latency while using hardware-accelerated computation using devices preconfigured resources efficiently, placing tasks where data dependencies with the proper firmware. When shifting a task to a will be satisfied, collocating tasks from the same job (when device that has not previously run it, computation cannot begin this will not overload the host or its GPU), and efficiently managing until all the prerequisites are in place. We can and do GPU memory. Comparison with other state of the art launch new ML instances when additional capacity is needed, schedulers shows a significant reduction in completion times but scheduling strategies must evolve to avoid thrashing.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found