SGD with Large Step Sizes Learns Sparse Features