Scaling Vision with Sparse Mixture of Experts Carlos Riquelme Google Brain Joan Puigcerver * Google Brain Basil Mustafa

Neural Information Processing Systems 

We present a Vision MoE (V -MoE), a sparse version of the Vision Transformer, that is scalable and competitive with the largest dense networks. When applied to image recognition, V -MoE matches the performance of state-of-the-art networks, while requiring as little as half of the compute at inference time.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found