Review for NeurIPS paper: Reinforcement Learning for Control with Multiple Frequencies

Neural Information Processing Systems 

Summary and Contributions: This work introduces an algorithm for reinforcement learning in settings with factored action spaces in which each element of the action space may have a different control frequency. To motivate the necessity of such an algorithm, it provides an argument that in this setting, a naive approach with a stationary Markovian policy on the states (which does not observe the timestep) can be suboptimal. Further, it argues that simply augmenting the state or action spaces and applying standard RL methods results in costs which are exponential in L, the least common multiple of the set of action persistences. In constructing the method this paper introduces c-persistent Bellman operators, a way of updating a Q-function in an environment with multiple action persistences, and proves its convergence. This leads to a method which uses L Q-functions, one for each step in the periodic structure of action persistences.