MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

Gale, Trevor, Narayanan, Deepak, Young, Cliff, Zaharia, Matei

Nov-28-2022–arXiv.org Artificial Intelligence

Our system is motivated by the limitations of current frameworks, which restrict the dynamic routing in MoE layers to satisfy the constraints of existing software and hardware. These formulations force a tradeoff between model quality and hardware efficiency, as users must choose between dropping tokens from the computation or wasting computation and memory on padding. To address these limitations, we reformulate MoE computation in terms of block-sparse operations and develop new block-sparse GPU kernels that efficiently handle the dynamism present in MoEs. Our approach never drops tokens and maps efficiently to modern hardware, enabling end-to-end training speedups of up to 40% over MoEs trained with the state-of-the-art Tutel library and 2.4 over DNNs trained with the highly-optimized Megatron-LM framework. The past decade has seen significant progress in are fundamental to these architectures. However, existing algorithms and high-performance software to make sparsity hardware and software for deep learning make it difficult practically useful (Gray et al., 2017; Narang et al., 2017; to meet this challenge. For example, TPUs and their XLA Kalchbrenner et al., 2018; Elsen et al., 2020; Gale et al., compiler require all tensor shapes to be known statically 2020).

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Nov-28-2022

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Washington > King County
      - Redmond (0.04)
    - New York > New York County
      - New York City (0.04)
    - Missouri > St. Louis County
      - St. Louis (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Georgia > Fulton County
      - Atlanta (0.04)
    - California
      - Los Angeles County > Long Beach (0.04)
      - Santa Clara County
        Stanford (0.04)
        Palo Alto (0.04)
    - Arizona > Maricopa County
      - Phoenix (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
    - Alberta > Census Division No. 6
      - Calgary Metropolitan Region > Calgary (0.04)
- Europe
  - France (0.04)
  - Austria (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)

Genre:
- Research Report (0.50)

Industry:
- Information Technology (0.30)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found