Mpemba Effect in Large-Language Model Training Dynamics: A Minimal Analysis of the Valley-River model