Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers

Oct-9-2025, 17:58:49 GMT–Neural Information Processing Systems

Interestingly, the scaling performance of structured matrices is explored, revealing steeper curves in scaling training FLOPs, along with a favorable scaling trend in the overtraining regime. Specifically, we show that wide and structured networks can utilize training FLOPs more efficiently, with fewer parameters and lower loss than dense models at their optimal trade-off.

arxiv preprint arxiv, experiment, matrix, (14 more...)

Neural Information Processing Systems

Oct-9-2025, 17:58:49 GMT

Conferences PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States
  - California > Santa Clara County > Palo Alto (0.04)
- Europe > Italy
  - Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
0877af85978e9e630b77f6221db47876-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found