Don't blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy