Hash Layers For Large Sparse Models

Neural Information Processing Systems 

A key component to a MoE model is the routing (gating) strategy.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found