integration

Neural Information Processing Systems 

Current operator library with quantized operators is not feasible for vision transformer inference because of the specific operators including the GeLU activation and layer normalization. Layer normalization (LayerNorm) normalizes the activations of each layer in a neural network independently, reducing internal covariate shift and improving training stability as follows: LayerNorm(x) = γ p Var(x)+ϵ (x µ)+β, (1) where x is the input tensor. We construct surrogate equations with fixed-point interactive methods to calculate the output of the square root operators inspired by I-BERT[3]. We provide the details of how to approximate the square root operators in Algorithm.1. GeLU requires the cumulative distribution function (CDF) of Gaussian distribution, we approximate the activation function by Equation.2[1].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found