Goto

Collaborating Authors

 Reinforcement Learning





State Regularized Policy Optimization on Data with Dynamics Shift

Neural Information Processing Systems

We then demonstrate a lower-bound performance guarantee on policies regularized by the stationary state distribution. In practice, SRPO can be an add-on module to context-based algorithms in both online and offline RL settings.


Importance Resamplingfor Off-policy Prediction

Neural Information Processing Systems

Thoughunbiased, IScanbehigh-variance. Alowervariancealternativeis Weighted IS (WIS). Figure 4: Learning Ratesensitivityplotsinthe Random Walk Markov Chain, withbuffersizen = 15000 andmini-batchsizek = 16.





ASPiRe: AdaptiveSkillPriorsforReinforcementLearning

Neural Information Processing Systems

Transferring prior experience to new tasks is central to an agent's adaptability. In this work, we aim to accelerate online reinforcement learning by leveraging prior experience from large offline data.


Distributional Reward Decomposition for Reinforcement Learning

Neural Information Processing Systems

Van Seijen et al. [2017] propose to split a state into different sub-states, each with a sub-reward obtained bytraining ageneral valuefunction, andlearnmultiple valuefunctions withsub-rewards. The architecture is rather limited due to requiring prior knowledge of how to split into sub-states.