Goto

Collaborating Authors

 Education








State Regularized Policy Optimization on Data with Dynamics Shift

Neural Information Processing Systems

We then demonstrate a lower-bound performance guarantee on policies regularized by the stationary state distribution. In practice, SRPO can be an add-on module to context-based algorithms in both online and offline RL settings.