Demystify Optimization Challenges in Multilingual Transformers