RoiRL: Efficient, Self-Supervised Reasoning with Offline Iterative Reinforcement Learning