Diving into Self-Evolving Training for Multimodal Reasoning