Spontaneous Reward Hacking in Iterative Self-Refinement

Open in new window