NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

Yu, Joonsang, Park, Junki, Park, Seongmin, Kim, Minsoo, Lee, Sihwa, Lee, Dong Hyun, Choi, Jungwook

Dec-3-2021–arXiv.org Artificial Intelligence

Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models. Several prior works simplified these operations with look-up tables or integer computations, but such approximations suffer inferior accuracy or considerable hardware cost with long latency. This paper proposes an accurate and hardware-friendly approximation framework for efficient Transformer inference. Our framework employs a simple neural network as a universal approximator with its structure equivalently transformed into a LUT. The proposed framework called NN-LUT can accurately replace all the non-linear operations in popular BERT models with significant reductions in area, power consumption, and latency.

machine learning, natural language, nn-lut, (18 more...)

arXiv.org Artificial Intelligence

Dec-3-2021

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language (1.00)