Tuning of Mixture-of-Experts Mixed-Precision Neural Networks

Sep-29-2022–arXiv.org Artificial Intelligence

Caffe has originally been created by Yangqing Jia, Evan Shelhamer, and Jeff Donahue [1]. Originally, Caffe was only intended for CPU and CUDA usage. We subsequently developed an OpenCL backend, based on ViennaCL [2], to support a variety of commodity hardware in 2015 [3-5]. Adaption for commodity hardware such as integrated GPUs, present in most modern computers, and embedded devices such as Raspberry Pi [6] and the Asus Tinkerboard [7] has been low, however. This is in part due to too slow inference speeds, which is a task that would typically be carried out in end-user applications. A possible usage scenario of our software would be to train a network on a discrete GPU for a robot, and then build the robot with a small, energy efficient embedded system-on-a-chip computer. In this work, we attempt to increase inference speed on both desktop and mobile GPUs by adding lower precision (quantized 8/16-bit integer and 16-bit floating point) and mixed precision networks. Additionally, we demonstrate how mixed-precision networks could potentially be combined with mixture-of-expert techniques to increase inference speed even further. Important terminology used throughout this work: BLAS: Basic linear algebra system: Matrix-matrix, matrix-vector, matrixscalar, vector-vector and vector-scalar operations.

artificial intelligence, data type, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Sep-29-2022

arXiv.org PDF

Add feedback

Country:
- Africa > Mali (0.05)
- Europe
  - Austria > Vienna (0.24)
  - Switzerland > Zürich
    - Zürich (0.04)

Genre:
- Research Report (0.40)

Industry:
- Information Technology > Hardware (0.50)

Technology:
- Information Technology
  - Hardware (1.00)
  - Artificial Intelligence > Machine Learning
    - Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found