Heterogeneous Self-Supervised Acoustic Pre-Training with Local Constraints