Universal scaling laws in the gradient descent training of neural networks

Open in new window