From Coarse to Fine: Efficient Training for Audio Spectrogram Transformers