AITopics | Deng, Haoyu

Collaborating Authors

Deng, Haoyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unveiling and Controlling Anomalous Attention Distribution in Transformers

Yan, Ruiqing, Du, Xingbo, Deng, Haoyu, Zheng, Linghan, Sun, Qiuzhuang, Hu, Jifang, Shao, Yuhang, Jiang, Penghao, Jiang, Jinrong, Zhao, Lian

arXiv.org Artificial IntelligenceJul-3-2024

With the advent of large models based on the Transformer architecture, researchers have observed an anomalous phenomenon in the Attention mechanism--there is a very high attention on the first element, which is prevalent across Transformer-based models. It is crucial to understand it for the development of techniques focusing on attention distribution, such as Key-Value (KV) Cache compression and infinite extrapolation; however, the latent cause leaves to be unknown. In this paper, we analyze such a phenomenon from the perspective of waiver phenomenon, which involves reducing the internal values of certain elements in the sequence, allowing them to absorb excess attention without affecting their contribution to information. In specific models, due to differences in positional encoding and attention patterns, we have found that the selection of waiver elements by the model can be categorized into two methods: positional-encoding-based and feature-distribution-within-elements-based.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2407.01601

Country:

Asia > China (0.48)
Oceania > Australia > New South Wales (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TCJA-SNN: Temporal-Channel Joint Attention for Spiking Neural Networks

Zhu, Rui-Jie, Zhao, Qihang, Zhang, Tianjing, Deng, Haoyu, Duan, Yule, Zhang, Malu, Deng, Liang-Jian

arXiv.org Artificial IntelligenceDec-9-2023

Spiking Neural Networks (SNNs) are attracting widespread interest due to their biological plausibility, energy efficiency, and powerful spatio-temporal information representation ability. Given the critical role of attention mechanisms in enhancing neural network performance, the integration of SNNs and attention mechanisms exhibits potential to deliver energy-efficient and high-performance computing paradigms. We present a novel Temporal-Channel Joint Attention mechanism for SNNs, referred to as TCJA-SNN. The proposed TCJA-SNN framework can effectively assess the significance of spike sequence from both spatial and temporal dimensions. More specifically, our essential technical contribution lies on: 1) We employ the squeeze operation to compress the spike stream into an average matrix. Then, we leverage two local attention mechanisms based on efficient 1D convolutions to facilitate comprehensive feature extraction at the temporal and channel levels independently. 2) We introduce the Cross Convolutional Fusion (CCF) layer as a novel approach to model the inter-dependencies between the temporal and channel scopes. This layer breaks the independence of these two dimensions and enables the interaction between features. Experimental results demonstrate that the proposed TCJA-SNN outperforms SOTA by up to 15.7% accuracy on standard static and neuromorphic datasets, including Fashion-MNIST, CIFAR10-DVS, N-Caltech 101, and DVS128 Gesture. Furthermore, we apply the TCJA-SNN framework to image generation tasks by leveraging a variation autoencoder. To the best of our knowledge, this study is the first instance where the SNN-attention mechanism has been employed for image classification and generation tasks. Notably, our approach has achieved SOTA performance in both domains, establishing a significant advancement in the field. Codes are available at https://github.com/ridgerchu/TCJA.

artificial intelligence, machine learning, survey article, (17 more...)

arXiv.org Artificial Intelligence

2206.10177

Country: North America > United States > California (0.14)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.48)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Tensor Decomposition Based Attention Module for Spiking Neural Networks

Deng, Haoyu, Zhu, Ruijie, Qiu, Xuerui, Duan, Yule, Zhang, Malu, Deng, Liangjian

arXiv.org Artificial IntelligenceOct-23-2023

The attention mechanism has been proven to be an effective way to improve spiking neural network (SNN). However, based on the fact that the current SNN input data flow is split into tensors to process on GPUs, none of the previous works consider the properties of tensors to implement an attention module. This inspires us to rethink current SNN from the perspective of tensor-relevant theories. Using tensor decomposition, we design the \textit{projected full attention} (PFA) module, which demonstrates excellent results with linearly growing parameters. Specifically, PFA is composed by the \textit{linear projection of spike tensor} (LPST) module and \textit{attention map composing} (AMC) module. In LPST, we start by compressing the original spike tensor into three projected tensors using a single property-preserving strategy with learnable parameters for each dimension. Then, in AMC, we exploit the inverse procedure of the tensor decomposition process to combine the three tensors into the attention map using a so-called connecting factor. To validate the effectiveness of the proposed PFA module, we integrate it into the widely used VGG and ResNet architectures for classification tasks. Our method achieves state-of-the-art performance on both static and dynamic benchmark datasets, surpassing the existing SNN models with Transformer-based and CNN-based backbones.

artificial intelligence, machine learning, tensor, (14 more...)

arXiv.org Artificial Intelligence

2310.14576

Country:

North America > United States > California (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Energy (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback