State Regularized Policy Optimization on Data with Dynamics Shift
–Neural Information Processing Systems
We then demonstrate a lower-bound performance guarantee on policies regularized by the stationary state distribution. In practice, SRPO can be an add-on module to context-based algorithms in both online and offline RL settings.
Neural Information Processing Systems
Feb-13-2026, 03:27:31 GMT