Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism