Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement