Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup