Why do classifier accuracies show linear trends under distribution shift?