integration

Apr-25-2026, 12:22:40 GMT–Neural Information Processing Systems

Current operator library with quantized operators is not feasible for vision transformer inference because of the specific operators including the GeLU activation and layer normalization. Layer normalization (LayerNorm) normalizes the activations of each layer in a neural network independently, reducing internal covariate shift and improving training stability as follows: LayerNorm(x) = γ p Var(x)+ϵ (x µ)+β, (1) where x is the input tensor. We construct surrogate equations with fixed-point interactive methods to calculate the output of the square root operators inspired by I-BERT[3]. We provide the details of how to approximate the square root operators in Algorithm.1. GeLU requires the cumulative distribution function (CDF) of Gaussian distribution, we approximate the activation function by Equation.2[1].

artificial intelligence, machine learning, search space, (14 more...)

Neural Information Processing Systems

Apr-25-2026, 12:22:40 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Duplicate Docs Excel Report

Title
A Operator integration

Similar Docs Excel Report more

Title	Similarity	Source
None found