Supplementary Material for Fast Vision Transformers with HiLo Attention
–Neural Information Processing Systems
Department of Data Science & AI, Monash University, Australia We organize our supplementary material as follows. In Section A, we describe the architecture specifications of LITv2. In Section B, we provide the derivation for the computational cost of HiLo attention. In Section C, we study the effect of window size based on CIFAR-100. In Section F, we provide more visualisation examples for spectrum analysis of HiLo attention. We use "ConvFFN Block" to differentiate our "ConvFFN" denotes our modified FFN layer where we adopt one layer of The overall framework of LITv2 is depicted in Figure I.
Neural Information Processing Systems
Aug-15-2025, 04:17:02 GMT
- Technology:
- Information Technology > Artificial Intelligence > Vision (0.87)