When Y our AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback Leon Lang University of Amsterdam Davis Foote