RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation

Open in new window