Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data
–Neural Information Processing Systems
When training deep neural networks, a model's generalization error is often observed to follow a power scaling law dependent both on the model size and the
Neural Information Processing Systems
Oct-10-2025, 01:29:21 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Georgia > Fulton County > Atlanta (0.04)
- Asia > Middle East
- Genre:
- Research Report > Experimental Study (0.92)
- Technology: