Fixed-point quantization aware training for on-device keyword-spotting

Macha, Sashank, Oza, Om, Escott, Alex, Caliva, Francesco, Armitano, Robbie, Cheekatmalla, Santosh Kumar, Parthasarathi, Sree Hari Krishnan, Liu, Yuzong

arXiv.org Artificial Intelligence 

Computational requirements can be reduced further using lowprecision Fixed-point (FXP) inference has proven suitable for embedded inference via quantization, which allows increased operations devices with limited computational resources, and yet model training per accessed memory byte [5, 7]. Such quantization is typically is continually performed in floating-point (FLP). FXP training achieved by means of post-training-quantization (PTQ) [8], which has not been fully explored and the non-trivial conversion from however causes severe information loss affecting model accuracy. FLP to FXP presents unavoidable performance drop. We propose To maintain overall accuracy for quantized DNNs, quantization can a novel method to train and obtain FXP convolutional keywordspotting be incorporated in the training phase leading to quantization-awaretraining (KWS) models. We combine our methodology with two (QAT). QAT introduces quantization noise during training quantization-aware-training (QAT) techniques - squashed weight by means of deterministic rounding [9, 10, 11], reparametrization distribution and absolute cosine regularization for model parameters, [12, 13] or regularization [14, 15] among few techniques, and propose techniques for extending QAT over transient allowing DNNs to adapt to inference quantization. Notable work variables, otherwise neglected by previous paradigms. Experimental has shown that with QAT model parameters can be learned at binary results on the Google Speech Commands v2 dataset show that we and ternary precision [16, 17].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found