we study VRTDC in the online Markovian setting, which covers many real-world RL applications that have online