Reinforcement Learning on Pre-Training Data