Causal Reward Adjustment: Mitigating Reward Hacking in External Reasoning via Backdoor Correction

Open in new window