When Y our AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback Leon Lang University of Amsterdam Davis Foote

Neural Information Processing Systems 

Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is

Similar Docs  Excel Report  more

TitleSimilaritySource
None found