Supplementary Materials for M3 ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with M odel-Accelerator Co-design

Aug-18-2025, 02:29:15 GMT–Neural Information Processing Systems

The final ViT block's output feature will be fed into decoders for multi-task predictions. The router is a single-layer MLP which maps token embedding to experts' selection probability. The batch size is 16. LUTs, 461K registers, 11 Mbit block RAM, and 27 Mbit UltraRAM. It runs at a clock frequency of 1,395 MHz and consumes 295 W of power.

artificial intelligence, machine learning, vit, (12 more...)

Neural Information Processing Systems

Aug-18-2025, 02:29:15 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Texas > Travis County > Austin (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
SupplementaryMaterialsforM3ViT: Mixture-of-ExpertsVision TransformerforEfficientMulti-taskLearning withModel-AcceleratorCo-design

Similar Docs Excel Report more

Title	Similarity	Source
None found