Batching-Aware Joint Model Onloading and Offloading for Hierarchical Multi-Task Inference
Cha, Seohyeon, Chan, Kevin, de Veciana, Gustavo, Vikalo, Haris
–arXiv.org Artificial Intelligence
--The growing demand for intelligent services on resource-constrained edge devices has spurred the development of collaborative inference systems that distribute workloads across end devices, edge servers, and the cloud. While most existing frameworks focus on single-task, single-model scenarios, many real-world applications (e.g., autonomous driving and augmented reality) require concurrent execution of diverse tasks including detection, segmentation, and depth estimation. In this work, we propose a unified framework to jointly decide which multi-task models to deploy ("onload") at clients and edge servers, and how to route queries across the hierarchy ("offload") to maximize overall inference accuracy under memory, compute, and communication constraints. We formulate this as a mixed-integer program and introduce J3O (Joint Optimization of Onloading and Offloading), an alternating algorithm that (i) greedily selects models to onload via Lagrangian-relaxed submodular optimization and (ii) determines optimal offloading via constrained linear programming. We further extend J3O to account for batching at the edge, maintaining scalability under heterogeneous task loads. Experiments show J3O consistently achieves over 97% of the optimal accuracy while incurring less than 15% of the runtime required by the optimal solver across multi-task benchmarks. The rapid proliferation of edge devices including smart-phones, surveillance cameras, and wearables, with possible latency and privacy requirements, has sparked interest in executing Machine Learning (ML)-based inference at the edge [1]. However, as state-of-the-art ML models continue to grow in size and complexity to achieve higher accuracy, their memory and compute requirements often exceed the capabilities of resource-constrained edge hardware [2], [3].
arXiv.org Artificial Intelligence
Aug-20-2025
- Country:
- North America > United States
- Maryland > Prince George's County
- Adelphi (0.04)
- Texas > Travis County
- Austin (0.14)
- Maryland > Prince George's County
- North America > United States
- Genre:
- Research Report (0.82)
- Industry:
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.46)
- Representation & Reasoning > Optimization (0.66)
- Vision (1.00)
- Information Technology > Artificial Intelligence