State Regularized Policy Optimization on Data with Dynamics Shift

Neural Information Processing Systems 

We then demonstrate a lower-bound performance guarantee on policies regularized by the stationary state distribution. In practice, SRPO can be an add-on module to context-based algorithms in both online and offline RL settings.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found