LAPO: Latent-VariableAdvantage-WeightedPolicy OptimizationforOfflineReinforcementLearning