Can Large Reasoning Models Self-Train?