Revisiting the Data Sampling in Multimodal Post-training from a Difficulty-Distinguish View