Towards Robust Speech Representation Learning for Thousands of Languages