Goto

Collaborating Authors

 South America









Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers

Neural Information Processing Systems

Interestingly, the scaling performance of structured matrices is explored, revealing steeper curves in scaling training FLOPs, along with a favorable scaling trend in the overtraining regime. Specifically, we show that wide and structured networks can utilize training FLOPs more efficiently, with fewer parameters and lower loss than dense models at their optimal trade-off.