Towards Understanding the Data Dependency of Mixup-style Training