AITopics | Pan, Yun

Collaborating Authors

Pan, Yun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

Dong, Pingcheng, Tan, Yonghao, Zhang, Dong, Ni, Tianwei, Liu, Xuejiao, Liu, Yu, Luo, Peng, Liang, Luhong, Liu, Shih-Yang, Huang, Xijie, Zhu, Huaiyu, Pan, Yun, An, Fengwei, Cheng, Kwang-Ting

arXiv.org Artificial IntelligenceMar-29-2024

The performance greatly benefits from the self-attention mechanism in Transformers, which could capture long-range dependencies Non-linear functions are prevalent in Transformers and their lightweight well, but with a substantial overhead in computation variants, incurring substantial and frequently underestimated and memory. Extensive research has been conducted to facilitate the hardware costs. Previous state-of-the-art works optimize deployment of Transformers on edge devices. Techniques like lightweight these operations by piece-wise linear approximation and store the structure integrating convolution and linear attention [4, 5] parameters in look-up tables (LUT), but most of them require unfriendly emerge, while quantization [6-8] and run-time pruning [9] has become high-precision arithmetics such as FP/INT 32 and lack consideration favored approaches to further reduced the hardware burden. of integer-only INT quantization. This paper proposed a However, the optimization of non-linear operations is frequently genetic LUT-Approximation algorithm namely GQA-LUT that can neglected in Transformer-based models which can be costly due to automatically determine the parameters with quantization awareness.

machine learning, natural language, quantization, (17 more...)

arXiv.org Artificial Intelligence

2403.19591

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback