48237d9f2dea8c74c2a72126cf63d933-Paper.pdf
–Neural Information Processing Systems
InComputerVision,however,almost all performant networks are "dense", that is, every input is processed by every parameter. We present a Vision MoE (V-MoE), a sparse version of the Vision Transformer, that is scalable and competitive with the largest dense networks.
Neural Information Processing Systems
Feb-8-2026, 11:45:58 GMT