Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

Open in new window