When Y our AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback Leon Lang University of Amsterdam Davis Foote
–Neural Information Processing Systems
Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is
Neural Information Processing Systems
Oct-10-2025, 12:43:26 GMT
- Country:
- Europe > Netherlands
- North Holland > Amsterdam (0.40)
- North America > United States (0.14)
- Europe > Netherlands
- Genre:
- Research Report > Experimental Study (1.00)
- Technology: