Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

Dec-27-2025, 05:28:30 GMT–Neural Information Processing Systems

Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance.

monarch mixer, sequence length and model dimension, simple sub-quadratic gemm-based architecture, (5 more...)

Neural Information Processing Systems

Dec-27-2025, 05:28:30 GMT

Conferences Web Page

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.75)