Goto

Collaborating Authors

 Education




State Regularized Policy Optimization on Data with Dynamics Shift

Neural Information Processing Systems

We then demonstrate a lower-bound performance guarantee on policies regularized by the stationary state distribution. In practice, SRPO can be an add-on module to context-based algorithms in both online and offline RL settings.








Learning

Neural Information Processing Systems

For additional motivation, it is reasonable to consider Massart noise to be a more realistic model of real-life noise (even when benign) when compared to the RCN model, as it allows for some amount of non-uniformity. This made Definition 1 a possibly tractable way to relax the noise assumption, without running intotheaforementioned computational barriers foragnostic learning.