LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning Xi Chen