Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
–Neural Information Processing Systems
Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance.
Neural Information Processing Systems
Dec-27-2025, 05:28:30 GMT
- Technology: