Learning Diverse Features in Vision Transformers for Improved Generalization