Width Provably Matters in Optimization for Deep Linear Neural Networks

Open in new window