Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers

Neural Information Processing Systems 

Interestingly, the scaling performance of structured matrices is explored, revealing steeper curves in scaling training FLOPs, along with a favorable scaling trend in the overtraining regime. Specifically, we show that wide and structured networks can utilize training FLOPs more efficiently, with fewer parameters and lower loss than dense models at their optimal trade-off.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found