A Minimaximalist Approach to Reinforcement Learning from Human Feedback