AITopics | nn-lut

Collaborating Authors

nn-lut

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

Dong, Pingcheng, Tan, Yonghao, Zhang, Dong, Ni, Tianwei, Liu, Xuejiao, Liu, Yu, Luo, Peng, Liang, Luhong, Liu, Shih-Yang, Huang, Xijie, Zhu, Huaiyu, Pan, Yun, An, Fengwei, Cheng, Kwang-Ting

arXiv.org Artificial IntelligenceMar-29-2024

The performance greatly benefits from the self-attention mechanism in Transformers, which could capture long-range dependencies Non-linear functions are prevalent in Transformers and their lightweight well, but with a substantial overhead in computation variants, incurring substantial and frequently underestimated and memory. Extensive research has been conducted to facilitate the hardware costs. Previous state-of-the-art works optimize deployment of Transformers on edge devices. Techniques like lightweight these operations by piece-wise linear approximation and store the structure integrating convolution and linear attention [4, 5] parameters in look-up tables (LUT), but most of them require unfriendly emerge, while quantization [6-8] and run-time pruning [9] has become high-precision arithmetics such as FP/INT 32 and lack consideration favored approaches to further reduced the hardware burden. of integer-only INT quantization. This paper proposed a However, the optimization of non-linear operations is frequently genetic LUT-Approximation algorithm namely GQA-LUT that can neglected in Transformer-based models which can be costly due to automatically determine the parameters with quantization awareness.

approximation, gqa-lut, quantization, (13 more...)

arXiv.org Artificial Intelligence

2403.19591

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Hong Kong (0.05)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

Yu, Joonsang, Park, Junki, Park, Seongmin, Kim, Minsoo, Lee, Sihwa, Lee, Dong Hyun, Choi, Jungwook

arXiv.org Artificial IntelligenceDec-3-2021

Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models. Several prior works simplified these operations with look-up tables or integer computations, but such approximations suffer inferior accuracy or considerable hardware cost with long latency. This paper proposes an accurate and hardware-friendly approximation framework for efficient Transformer inference. Our framework employs a simple neural network as a universal approximator with its structure equivalently transformed into a LUT. The proposed framework called NN-LUT can accurately replace all the non-linear operations in popular BERT models with significant reductions in area, power consumption, and latency.

approximation, nn-lut, non-linear operation, (15 more...)

arXiv.org Artificial Intelligence

2112.02191

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback