Multi-Scale VMamba: Hierarchy in Hierarchy Visual State Space Model

Neural Information Processing Systems 

Despite the significant achievements of Vision Transformers (ViTs) in various vision tasks, they are constrained by the quadratic complexity.