Minimax Model Learning
Voloshin, Cameron, Jiang, Nan, Yue, Yisong
–arXiv.org Artificial Intelligence
We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO.
arXiv.org Artificial Intelligence
Mar-2-2021
- Country:
- North America
- United States
- Wisconsin > Dane County
- Madison (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California
- San Diego County > San Diego (0.04)
- Los Angeles County > Long Beach (0.04)
- Wisconsin > Dane County
- Canada > British Columbia
- United States
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- United Kingdom > England
- North America
- Genre:
- Research Report (0.81)
- Industry:
- Government (0.67)
- Technology: