Tuning of Mixture-of-Experts Mixed-Precision Neural Networks

Tschopp, Fabian

arXiv.org Artificial Intelligence 

Caffe has originally been created by Yangqing Jia, Evan Shelhamer, and Jeff Donahue [1]. Originally, Caffe was only intended for CPU and CUDA usage. We subsequently developed an OpenCL backend, based on ViennaCL [2], to support a variety of commodity hardware in 2015 [3-5]. Adaption for commodity hardware such as integrated GPUs, present in most modern computers, and embedded devices such as Raspberry Pi [6] and the Asus Tinkerboard [7] has been low, however. This is in part due to too slow inference speeds, which is a task that would typically be carried out in end-user applications. A possible usage scenario of our software would be to train a network on a discrete GPU for a robot, and then build the robot with a small, energy efficient embedded system-on-a-chip computer. In this work, we attempt to increase inference speed on both desktop and mobile GPUs by adding lower precision (quantized 8/16-bit integer and 16-bit floating point) and mixed precision networks. Additionally, we demonstrate how mixed-precision networks could potentially be combined with mixture-of-expert techniques to increase inference speed even further. Important terminology used throughout this work: BLAS: Basic linear algebra system: Matrix-matrix, matrix-vector, matrixscalar, vector-vector and vector-scalar operations.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found