Mixture of Tokens: Continuous MoE through Cross-Example Aggregation
–Neural Information Processing Systems
Most widely adopted MoE models are discontinuous with respect to their parameters - often referred to as sparse .
Neural Information Processing Systems
Feb-17-2026, 19:42:19 GMT
- Country:
- Asia
- Japan > Honshū
- Chūbu > Toyama Prefecture > Toyama (0.04)
- Middle East > Jordan (0.05)
- Japan > Honshū
- Europe
- Monaco (0.04)
- Poland > Masovia Province
- Warsaw (0.05)
- Asia
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (1.00)
- Research Report
- Technology: