Goto

Collaborating Authors

 Reinforcement Learning







Mutual Information Regularized Offline Reinforcement Learning

Neural Information Processing Systems

We show that optimizing this lower bound is equivalent to maximizing the likelihood of a one-step improved policy on the offline dataset. Hence, we constrain the policy improvement direction to lie in the data manifold.


Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

Neural Information Processing Systems

Artificial intelligence (AI) systems possess significant potential to drive societal progress. However, their deployment often faces obstacles due to substantial safety concerns.


Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark

Neural Information Processing Systems

Artificial intelligence (AI) systems possess significant potential to drive societal progress. However, their deployment often faces obstacles due to substantial safety concerns.