AITopics | fast vision transformer

Fast Vision Transformers with HiLo Attention

Neural Information Processing SystemsDec-24-2025, 07:10:26 GMT

Vision Transformers (ViTs) have triggered the most recent and significant breakthroughs in computer vision. Their efficient designs are mostly guided by the indirect metric of computational complexity, i.e., FLOPs, which however has a clear gap with the direct metric such as throughput. Thus, we propose to use the direct speed evaluation on the target platform as the design principle for efficient ViTs. Particularly, we introduce LITv2, a simple and effective ViT which performs favourably against the existing state-of-the-art methods across a spectrum of different model sizes with faster speed. At the core of LITv2 is a novel self-attention mechanism, which we dub HiLo. HiLo is inspired by the insight that high frequencies in an image capture local fine details and low frequencies focus on global structures, whereas a multi-head self-attention layer neglects the characteristic of different frequencies.

fast vision transformer, name change, vision transformer, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.98)

Add feedback

Supplementary Material for Fast Vision Transformers with HiLo Attention

Neural Information Processing SystemsAug-15-2025, 04:17:02 GMT

Department of Data Science & AI, Monash University, Australia We organize our supplementary material as follows. In Section A, we describe the architecture specifications of LITv2. In Section B, we provide the derivation for the computational cost of HiLo attention. In Section C, we study the effect of window size based on CIFAR-100. In Section F, we provide more visualisation examples for spectrum analysis of HiLo attention. We use "ConvFFN Block" to differentiate our "ConvFFN" denotes our modified FFN layer where we adopt one layer of The overall framework of LITv2 is depicted in Figure I.

feature map, transformer, window size, (14 more...)

Neural Information Processing Systems

Country: Oceania > Australia (0.24)

Technology: Information Technology > Artificial Intelligence > Vision (0.87)

Add feedback

Fast Vision Transformers with HiLo Attention

Neural Information Processing SystemsOct-11-2024, 06:20:41 GMT

Vision Transformers (ViTs) have triggered the most recent and significant breakthroughs in computer vision. Their efficient designs are mostly guided by the indirect metric of computational complexity, i.e., FLOPs, which however has a clear gap with the direct metric such as throughput. Thus, we propose to use the direct speed evaluation on the target platform as the design principle for efficient ViTs. Particularly, we introduce LITv2, a simple and effective ViT which performs favourably against the existing state-of-the-art methods across a spectrum of different model sizes with faster speed. At the core of LITv2 is a novel self-attention mechanism, which we dub HiLo. HiLo is inspired by the insight that high frequencies in an image capture local fine details and low frequencies focus on global structures, whereas a multi-head self-attention layer neglects the characteristic of different frequencies.

fast vision transformer, frequency, vision transformer, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Filters

Collaborating Authors

fast vision transformer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Fast Vision Transformers with HiLo Attention

Supplementary Material for Fast Vision Transformers with HiLo Attention

Fast Vision Transformers with HiLo Attention