Logarithmic Switching Cost in Reinforcement Learning beyond Linear MDPs

Open in new window