RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning
–Neural Information Processing Systems
The model is trained to minimise the value function while still accurately predicting the transitions in the dataset, forcing the policy to act conservatively in areas not covered by the dataset. To approximately solve the two-player game, we alternate between optimising the policy and adversarially optimising the model.
Neural Information Processing Systems
Aug-15-2025, 11:28:57 GMT
- Country:
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Genre:
- Research Report
- New Finding (0.67)
- Promising Solution (0.46)
- Research Report
- Technology: