When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

Open in new window