LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning Xi Chen
–Neural Information Processing Systems
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios, such as data from several human demonstrators or from
Neural Information Processing Systems
Aug-19-2025, 17:57:02 GMT
- Country:
- Asia > China
- Guangdong Province > Shenzhen (0.04)
- North America > United States (0.04)
- Asia > China
- Genre:
- Research Report (0.69)
- Technology: