A Standardized Benchmark for Machine-Learned Molecular Dynamics using Weighted Ensemble Sampling