Online Reinforcement Learning with Uncertain Episode Lengths