Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning