SGD with Large Step Sizes Learns Sparse Features

Open in new window