AITopics | Shi, Yingdong

Collaborating Authors

Shi, Yingdong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Discovering Influential Neuron Path in Vision Transformers

Wang, Yifan, Liu, Yifei, Shi, Yingdong, Li, Changming, Pang, Anqi, Yang, Sibei, Yu, Jingyi, Ren, Kan

arXiv.org Artificial IntelligenceMar-12-2025

Vision Transformer models exhibit immense power yet remain opaque to human understanding, posing challenges and risks for practical applications. While prior research has attempted to demystify these models through input attribution and neuron role analysis, there's been a notable gap in considering layer-level information and the holistic path of information flow across layers. In this paper, we investigate the significance of influential neuron paths within vision Transformers, which is a path of neurons from the model input to output that impacts the model inference most significantly. We first propose a joint influence measure to assess the contribution of a set of neurons to the model outcome. And we further provide a layer-progressive neuron locating approach that efficiently selects the most influential neuron at each layer trying to discover the crucial neuron path from input to output within the target model. Our experiments demonstrate the superiority of our method finding the most influential neuron path along which the information flows, over the existing baseline solutions. Additionally, the neuron paths have illustrated that vision Transformers exhibit some specific inner working mechanism for processing the visual information within the same image category. We further analyze the key effects of these neurons on the image classification task, showcasing that the found neuron paths have already preserved the model capability on downstream tasks, which may also shed some lights on real-world applications like model pruning. Transformer (V aswani et al., 2017) models in the vision domain, such as supervised Vision Transformers (Dosovitskiy et al., 2021) (ViT) or self-supervised pretrained models (He et al., 2022; Oquab et al., 2023), have showcased remarkable performance in various real-world tasks like image classification (Dosovitskiy et al., 2021) and image synthesis (Peebles & Xie, 2023). However, the inner workings of these vision Transformer models remain elusive, despite their impressive achievements. Understanding the internal mechanisms of vision models is crucial for both research and practical applications. When confronted with the model decision outputs, one may raise some questions that, how is the vision Transformer model processing the input information by layer, and which part of the model is significant to derive the final outcome? Unraveling the synergy within these models is essential for comprehending machine learning systems.

artificial intelligence, machine learning, neuron, (17 more...)

arXiv.org Artificial Intelligence

2503.09046

Country:

North America > United States > California (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation

Yang, Yuchen, Shi, Yingdong, Wang, Cheems, Zhen, Xiantong, Shi, Yuxuan, Xu, Jun

arXiv.org Artificial IntelligenceJun-23-2024

Fine-tuning pretrained large models to downstream tasks is an important problem, which however suffers from huge memory overhead due to large-scale parameters. This work strives to reduce memory overhead in fine-tuning from perspectives of activation function and layer normalization. To this end, we propose the Approximate Backpropagation (Approx-BP) theory, which provides the theoretical feasibility of decoupling the forward and backward passes. We apply our Approx-BP theory to backpropagation training and derive memory-efficient alternatives of GELU and SiLU activation functions, which use derivative functions of ReLUs in the backward pass while keeping their forward pass unchanged. In addition, we introduce a Memory-Sharing Backpropagation strategy, which enables the activation memory to be shared by two adjacent layers, thereby removing activation memory usage redundancy. Our method neither induces extra computation nor reduces training efficiency. We conduct extensive experiments with pretrained vision and language models, and the results demonstrate that our proposal can reduce up to $\sim$$30\%$ of the peak memory usage. Our code is released at https://github.com/yyyyychen/LowMemoryBP.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2406.16282

Country:

North America > United States (0.28)
Asia > China (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Energy > Oil & Gas (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback