AITopics | Kim, Seah

Collaborating Authors

Kim, Seah

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads

Kim, Seah, Kwon, Hyoukjun, Song, Jinook, Jo, Jihyuck, Chen, Yu-Hsin, Lai, Liangzhen, Chandra, Vikas

arXiv.org Artificial IntelligenceSep-20-2023

Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone control involve dynamic behaviors in various granularity; task, model, and layers within a model. Such dynamic behaviors introduce new challenges to the system software in an ML system since the overall system load is not completely predictable, unlike traditional ML workloads. In addition, RTMM workloads require real-time processing, involve highly heterogeneous models, and target resource-constrained devices. Under such circumstances, developing an effective scheduler gains more importance to better utilize underlying hardware considering the unique characteristics of RTMM workloads. Therefore, we propose a new scheduler, DREAM, which effectively handles various dynamicity in RTMM workloads targeting multi-accelerator systems. DREAM quantifies the unique requirements for RTMM workloads and utilizes the quantified scores to drive scheduling decisions, considering the current system load and other inference jobs on different models and input frames. DREAM utilizes tunable parameters that provide fast and effective adaptivity to dynamic workload changes. In our evaluation of five scenarios of RTMM workload, DREAM reduces the overall UXCost, which is an equivalent metric of the energy-delay product (EDP) for RTMM defined in the paper, by 32.2% and 50.0% in the geometric mean (up to 80.8% and 97.6%) compared to state-of-the-art baselines, which shows the efficacy of our scheduling methodology.

machine learning, real time system, workload, (22 more...)

arXiv.org Artificial Intelligence

2212.03414

Country:

Europe (1.00)
North America > Canada (0.68)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Energy (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
(2 more...)

Add feedback

MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks

Kim, Seah, Genc, Hasan, Nikiforov, Vadim Vadimovich, Asanović, Krste, Nikolić, Borivoje, Shao, Yakun Sophia

arXiv.org Artificial IntelligenceMay-9-2023

Driven by the wide adoption of deep neural networks (DNNs) across different application domains, multi-tenancy execution, where multiple DNNs are deployed simultaneously on the same hardware, has been proposed to satisfy the latency requirements of different applications while improving the overall system utilization. However, multi-tenancy execution could lead to undesired system-level resource contention, causing quality-of-service (QoS) degradation for latency-critical applications. To address this challenge, we propose MoCA, an adaptive multi-tenancy system for DNN accelerators. Unlike existing solutions that focus on compute resource partition, MoCA dynamically manages shared memory resources of co-located applications to meet their QoS targets. Specifically, MoCA leverages the regularities in both DNN operators and accelerators to dynamically modulate memory access rates based on their latency targets and user-defined priorities so that co-located applications get the resources they demand without significantly starving their co-runners. We demonstrate that MoCA improves the satisfaction rate of the service level agreement (SLA) up to 3.9x (1.8x average), system throughput by 2.3x (1.7x average), and fairness by 1.3x (1.2x average), compared to prior work.

artificial intelligence, machine learning, workload, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/HPCA56546.2023.10071035

2305.05843

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback