Recommendations for Comprehensive and Independent Evaluation of Machine Learning-Based Earth System Models