Integer-Only Inference for Deep Learning in Native C


Integer-only inference allows for the compression of deep learning models for deployment on low-compute and low-latency devices. Many embedded devices are programmed using native C and do not support floating-point operations and dynamic allocation. Nevertheless, small deep learning models can be deployed to such devices with an integer-only inference pipeline through uniform quantization and the fixed-point representation. We employed these methods to deploy a deep reinforcement learning (RL) model on a network interface card (NIC) (Tessler et al. 2021[1]). Successfully deploying the RL model required inference latency of O(microseconds) on a device with no floating-point operation support.

Duplicate Docs Excel Report

None found

Similar Docs  Excel Report  more

None found