Supplementary Materials for M3 ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with M odel-Accelerator Co-design

Neural Information Processing Systems 

The final ViT block's output feature will be fed into decoders for multi-task predictions. The router is a single-layer MLP which maps token embedding to experts' selection probability. The batch size is 16. LUTs, 461K registers, 11 Mbit block RAM, and 27 Mbit UltraRAM. It runs at a clock frequency of 1,395 MHz and consumes 295 W of power.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found