resource cost
Lift What You Can: Green Online Learning with Heterogeneous Ensembles
Köbschall, Kirsten, Buschjäger, Sebastian, Fischer, Raphael, Hartung, Lisa, Kramer, Stefan
Ensemble methods for stream mining necessitate managing multiple models and updating them as data distributions evolve. Considering the calls for more sustainability, established methods are however not sufficiently considerate of ensemble members' computational expenses and instead overly focus on predictive capabilities. To address these challenges and enable green online learning, we propose heterogeneous online ensembles (HEROS). For every training step, HEROS chooses a subset of models from a pool of models initialized with diverse hyperparameter choices under resource constraints to train. We introduce a Markov decision process to theoretically capture the trade-offs between predictive performance and sustainability constraints. Based on this framework, we present different policies for choosing which models to train on incoming data. Most notably, we propose the novel $ζ$-policy, which focuses on training near-optimal models at reduced costs. Using a stochastic model, we theoretically prove that our $ζ$-policy achieves near optimal performance while using fewer resources compared to the best performing policy. In our experiments across 11 benchmark datasets, we find empiric evidence that our $ζ$-policy is a strong contribution to the state-of-the-art, demonstrating highly accurate performance, in some cases even outperforming competitors, and simultaneously being much more resource-friendly.
A Preliminaries
A.1 Preliminaries on Comparison of Exploration Space The size of exploration space for the layer-wise manner in SET algorithm is: null N When the corresponding exploration region of each group is defined as in Section 3.4, the exploration EEG2 datasets are shown in Figure 9, Figure 10 and Figure 11, respectively. The eNRF sizes shown in a sub-figure are from the first dynamic sparse CNN layer with a specific sparsity ratio (e.g. Figure 10: The statistics of eNRF sizes for the Daily Sport dataset.The eNRF sizes shown in a S = 50% (left), S = 70% (middle), and S = 90% (right)). Figure 11: The statistics of eNRF sizes for the EEG2 dataset.The eNRF sizes shown in a sub-figure To demonstrate the advantage of various eNRF coverage, we compare the performances in terms of test accuracy between DSN models with various eNRF coverage and optimal NRF. However, when the kernel size is oversized (e.g. As discussed in 4.5 and B.1, grouping the kernels in each dynamic sparse CNN layer The corresponding exploration regions for each group are shown in the red boxes.
Communication and Computation Efficient Split Federated Learning in O-RAN
Gu, Shunxian, You, Chaoqun, Ren, Bangbang, Guo, Deke
The hierarchical architecture of Open Radio Access Network (O-RAN) has enabled a new Federated Learning (FL) paradigm that trains models using data from non- and near-real-time (near-RT) Radio Intelligent Controllers (RICs). However, the ever-increasing model size leads to longer training time, jeopardizing the deadline requirements for both non-RT and near-RT RICs. To address this issue, split federated learning (SFL) offers an approach by offloading partial model layers from near-RT-RIC to high-performance non-RT-RIC. Nonetheless, its deployment presents two challenges: (i) Frequent data/gradient transfers between near-RT-RIC and non-RT-RIC in SFL incur significant communication cost in O-RAN. (ii) Proper allocation of computational and communication resources in O-RAN is vital to satisfying the deadline and affects SFL convergence. Therefore, we propose SplitMe, an SFL framework that exploits mutual learning to alternately and independently train the near-RT-RIC's model and the non-RT-RIC's inverse model, eliminating frequent transfers. The ''inverse'' of the inverse model is derived via a zeroth-order technique to integrate the final model. Then, we solve a joint optimization problem for SplitMe to minimize overall resource costs with deadline-aware selection of near-RT-RICs and adaptive local updates. Our numerical results demonstrate that SplitMe remarkably outperforms FL frameworks like SFL, FedAvg and O-RANFed regarding costs and convergence.
An Economic Framework for 6-DoF Grasp Detection
Wu, Xiao-Ming, Cai, Jia-Feng, Jiang, Jian-Jian, Zheng, Dian, Wei, Yi-Lin, Zheng, Wei-Shi
Robotic grasping in clutters is a fundamental task in robotic manipulation. In this work, we propose an economic framework for 6-DoF grasp detection, aiming to economize the resource cost in training and meanwhile maintain effective grasp performance. To begin with, we discover that the dense supervision is the bottleneck of current SOTA methods that severely encumbers the entire training overload, meanwhile making the training difficult to converge. To solve the above problem, we first propose an economic supervision paradigm for efficient and effective grasping. This paradigm includes a well-designed supervision selection strategy, selecting key labels basically without ambiguity, and an economic pipeline to enable the training after selection. Furthermore, benefit from the economic supervision, we can focus on a specific grasp, and thus we devise a focal representation module, which comprises an interactive grasp head and a composite score estimation to generate the specific grasp more accurately. Combining all together, the Economic-Grasp framework is proposed. Our extensive experiments show that EconomicGrasp surpasses the SOTA grasp method by about 3AP on average, and with extremely low resource cost, for about 1/4 training time cost, 1/8 memory cost and 1/30 storage cost.
Equity through Access: A Case for Small-scale Deep Learning
Selvan, Raghavendra, Pepin, Bob, Igel, Christian, Samuel, Gabrielle, Dam, Erik B
The recent advances in deep learning (DL) have been accelerated by access to large-scale data and compute. These large-scale resources have been used to train progressively larger models which are resource intensive in terms of compute, data, energy, and carbon emissions. These costs are becoming a new type of entry barrier to researchers and practitioners with limited access to resources at such scale, particularly in the Global South. In this work, we take a comprehensive look at the landscape of existing DL models for vision tasks and demonstrate their usefulness in settings where resources are limited. To account for the resource consumption of DL models, we introduce a novel measure to estimate the performance per resource unit, which we call the PePR score. Using a diverse family of 131 unique DL architectures (spanning 1M to 130M trainable parameters) and three medical image datasets, we capture trends about the performance-resource trade-offs. In applications like medical image analysis, we argue that small-scale, specialized models are better than striving for large-scale models. Furthermore, we show that using pretrained models can significantly reduce the computational resources and data required. We hope this work will encourage the community to focus on improving AI equity by developing methods and models with smaller resource footprints.
Federated Learning While Providing Model as a Service: Joint Training and Inference Optimization
Han, Pengchao, Wang, Shiqiang, Jiao, Yang, Huang, Jianwei
While providing machine learning model as a service to process users' inference requests, online applications can periodically upgrade the model utilizing newly collected data. Federated learning (FL) is beneficial for enabling the training of models across distributed clients while keeping the data locally. However, existing work has overlooked the coexistence of model training and inference under clients' limited resources. This paper focuses on the joint optimization of model training and inference to maximize inference performance at clients. Such an optimization faces several challenges. The first challenge is to characterize the clients' inference performance when clients may partially participate in FL. To resolve this challenge, we introduce a new notion of age of model (AoM) to quantify client-side model freshness, based on which we use FL's global model convergence error as an approximate measure of inference performance. The second challenge is the tight coupling among clients' decisions, including participation probability in FL, model download probability, and service rates. Toward the challenges, we propose an online problem approximation to reduce the problem complexity and optimize the resources to balance the needs of model training and inference. Experimental results demonstrate that the proposed algorithm improves the average inference accuracy by up to 12%.
RHFedMTL: Resource-Aware Hierarchical Federated Multi-Task Learning
Yi, Xingfu, Li, Rongpeng, Peng, Chenghui, Wang, Fei, Wu, Jianjun, Zhao, Zhifeng
The rapid development of artificial intelligence (AI) over massive applications including Internet-of-things on cellular network raises the concern of technical challenges such as privacy, heterogeneity and resource efficiency. Federated learning is an effective way to enable AI over massive distributed nodes with security. However, conventional works mostly focus on learning a single global model for a unique task across the network, and are generally less competent to handle multi-task learning (MTL) scenarios with stragglers at the expense of acceptable computation and communication cost. Meanwhile, it is challenging to ensure the privacy while maintain a coupled multi-task learning across multiple base stations (BSs) and terminals. In this paper, inspired by the natural cloud-BS-terminal hierarchy of cellular works, we provide a viable resource-aware hierarchical federated MTL (RHFedMTL) solution to meet the heterogeneity of tasks, by solving different tasks within the BSs and aggregating the multi-task result in the cloud without compromising the privacy. Specifically, a primal-dual method has been leveraged to effectively transform the coupled MTL into some local optimization sub-problems within BSs. Furthermore, compared with existing methods to reduce resource cost by simply changing the aggregation frequency, we dive into the intricate relationship between resource consumption and learning accuracy, and develop a resource-aware learning strategy for local terminals and BSs to meet the resource budget. Extensive simulation results demonstrate the effectiveness and superiority of RHFedMTL in terms of improving the learning accuracy and boosting the convergence rate.