The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models

Open in new window