Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices

May-26-2025, 15:27:23 GMT–Neural Information Processing Systems

Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts to develop alternatives have focused on a small number of hand-crafted structured matrices, and have neglected to investigate whether these structures can surpass dense layers in terms of compute-optimal scaling laws when both the model size and training examples are optimally allocated. In this work, we present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. This framework encompasses many previously proposed structures, such as low-rank, Kronecker, Tensor-Train, and Monarch, along with many novel structures. We develop a taxonomy of all such operators based on their computational and algebraic properties, which provides insights into their scaling laws.

artificial intelligence, inductive learning, machine learning, (6 more...)

Neural Information Processing Systems

May-26-2025, 15:27:23 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Inductive Learning (0.62)
  - Neural Networks (0.42)