Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning