Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models