LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning Xi Chen

Neural Information Processing Systems 

In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios, such as data from several human demonstrators or from