When does mixup promote local linearity in learned representations?