LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning

Open in new window