Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets