EAT: Self-Supervised Pre-Training with Efficient Audio Transformer