Universal scaling laws in the gradient descent training of neural networks