AITopics | augmented shortcut

Collaborating Authors

augmented shortcut

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

818f4654ed39a1c147d1e51a00ffb4cb-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 14:06:16 GMT

augmented shortcut, shortcut, transformer, (15 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Augmented Shortcuts for Vision Transformers

Neural Information Processing SystemsDec-24-2025, 09:17:57 GMT

Transformer models have achieved great progress on computer vision tasks recently. The rapid development of vision transformers is mainly contributed by their high representation ability for extracting informative features from input images. However, the mainstream transformer models are designed with deep architectures, and the feature diversity will be continuously reduced as the depth increases, \ie, feature collapse. In this paper, we theoretically analyze the feature collapse phenomenon and study the relationship between shortcuts and feature diversity in these transformer models. Then, we present an augmented shortcut scheme, which inserts additional paths with learnable parameters in parallel on the original shortcuts. To save the computational costs, we further explore an efficient approach that uses the block-circulant projection to implement augmented shortcuts. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed method, which brings about 1% accuracy increase of the state-of-the-art visual transformers without obviously increasing their parameters and FLOPs.

augmented shortcut, name change, vision transformer, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Augmented Shortcuts for Vision Transformers (Supplementary Material) Y ehui T ang

Neural Information Processing SystemsAug-15-2025, 12:22:57 GMT

Following that original shortcut connections exist in both MSA and MLP modules, the proposed augmented shortcuts are also embedded into the MLP module (Eq. 10 in the main paper). S.4), the diversity in a block of Aug-ViT model is: Deep neural networks with trainable activations and controlled lipschitz constant.

diversity, mlp module, module, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Add feedback

818f4654ed39a1c147d1e51a00ffb4cb-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 12:22:54 GMT

augmented shortcut, shortcut, transformer, (15 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Augmented Shortcuts for Vision Transformers

Neural Information Processing SystemsOct-11-2024, 12:17:58 GMT

augmented shortcut, feature diversity, vision transformer, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

PanGu-$\pi$: Enhancing Language Model Architectures via Nonlinearity Compensation

Wang, Yunhe, Chen, Hanting, Tang, Yehui, Guo, Tianyu, Han, Kai, Nie, Ying, Wang, Xutao, Hu, Hailin, Bai, Zheyuan, Wang, Yun, Liu, Fangcheng, Liu, Zhicheng, Guo, Jianyuan, Zeng, Sinan, Zhang, Yinchen, Xu, Qinghua, Liu, Qun, Yao, Jun, Xu, Chao, Tao, Dacheng

arXiv.org Artificial IntelligenceDec-27-2023

Abstract--The recent trend of large language models (LLMs) is to increase the scale of both model size (a.k.a the number of parameters) and dataset to achieve better generative ability, which is definitely proved by a lot of work such as the famous GPT and Llama. However, large models often involve massive computational costs, and practical applications cannot afford such high prices. However, the method of constructing a strong model architecture for LLMs is rarely discussed. We first analyze the state-of-the-art language model architectures and observe the feature collapse problem. Based on the theoretical analysis, we propose that the nonlinearity is also very important for language models, which is usually studied in convolutional neural networks for vision tasks. The series informed activation function is then introduced with tiny calculations that can be ignored, and an augmented shortcut is further used to enhance the model nonlinearity. We then demonstrate that the proposed approach is significantly effective for enhancing the model nonlinearity through carefully designed ablations; thus, we present a new efficient model architecture for establishing modern, namely, PanGu- π . Experiments are then conducted using the same dataset and training strategy to compare PanGu- π with state-of-the-art LLMs. The results show that PanGu- π -7B can achieve a comparable performance to that of benchmarks with about 10% inference speed-up, and PanGu- π -1B can achieve state-of-the-art performance in terms of accuracy and efficiency. In addition, we have deployed PanGu- π -7B in the high-value domains of finance and law, developing an LLM named YunShan for practical application. The results show that YunShan can surpass other models with similar scales on benchmarks. As shown in Figure 1, our translation, text summarization, and dialogue.

architecture, arxiv preprint arxiv, language model, (12 more...)

arXiv.org Artificial Intelligence

2312.17276

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.86)

Industry:

Law (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback