Sustainable AI Processing at the Edge

Ollivier, Sébastien, Li, Sheng, Tang, Yue, Chaudhuri, Chayanika, Zhou, Peipei, Tang, Xulong, Hu, Jingtong, Jones, Alex K.

arXiv.org Artificial Intelligence 

Deep neural networks have become a popular algorithm for a variety of applications using mobile devices including smart phones but also recently expanding to connected and autonomous vehicles (CAVs), robotics, or even unmanned aerial vehicles (UAVs), and other smart infrastructure. Convolutional Neural Networks (CNNs) have been demonstrated to provide solutions to these problems with relatively high accuracy. While there have been many proposals to improve the performance and energy efficiency of CNN inference, these algorithms are too compute and data intensive to execute directly on mobile nodes typically operating with limited computational and energy capabilities. Thus, edge servers, now being deployed often in conjunction with advanced (e.g., 5G) wireless networks, have become a popular target to accelerate CNN inference. Moreover, due to their deployment in the field, edge servers must operate under size, weight, and power (SWaP) constraints, while serving many concurrent requests from mobile clients. Thus, to accelerate CNNs, these edge servers often use energy-efficient accelerators, reduced precision, or both to achieve fast response time while balancing requests from multiple clients and maintaining a low operational energy cost. Recently, there has been a trend to push online training to edge server nodes to avoid communicating large datasets from edge to cloud servers [1]. However, online training typically requires much higher precision and floating-point computation compared to inference. Unfortunately, the proliferation of computing, both the mobile devices, and the edge servers themselves, can come at the expense of negative environmental impacts.