Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models