Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning

Neural Information Processing Systems 

In offline RL, a critical challenge is distribution shift (also called "extrapolation error" in literature).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found