Sparse maximal update parameterization: A holistic approach to sparse training dynamics

Open in new window