Scaling Laws for the Principled Design, Initialization and Preconditioning of ReLU Networks

Open in new window