Provable Reinforcement Learning from Human Feedback with an Unknown Link Function

Open in new window