Causal Reward Adjustment: Mitigating Reward Hacking in External Reasoning via Backdoor Correction