Dueling RL: Reinforcement Learning with Trajectory Preferences