Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms