Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function

Zheng, Ruijie, Wang, Xiyao, Xu, Huazhe, Huang, Furong

Feb-2-2023–arXiv.org Artificial Intelligence

Probabilistic dynamics model ensemble is widely used in existing model-based reinforcement learning methods as it outperforms a single dynamics model in both asymptotic performance and sample efficiency. In this paper, we provide both practical and theoretical insights on the empirical success of the probabilistic dynamics model ensemble through the lens of Lipschitz continuity. We find that, for a value function, the stronger the Lipschitz condition is, the smaller the gap between the true dynamics-and learned dynamics-induced Bellman operators is, thus enabling the converged value function to be closer to the optimal value function. Hence, we hypothesize that the key functionality of the probabilistic dynamics model ensemble is to regularize the Lipschitz condition of the value function using generated samples. To test this hypothesis, we devise two practical robust training mechanisms through computing the adversarial noise and regularizing the value network's spectral norm to directly regularize the Lipschitz condition of the value functions. Empirical results show that combined with our mechanisms, model-based RL algorithms with a single dynamics model outperform those with an ensemble of probabilistic dynamics models. These findings not only support the theoretical insight, but also provide a practical solution for developing computationally efficient model-based RL algorithms. Model-based reinforcement learning (MBRL) improves the sample efficiency of an agent by learning a model of the underlying dynamics in a real environment. One of the most fundamental questions in this area is how to learn a model to generate good samples so that it maximally boosts the sample efficiency of policy learning. To address this question, various model architectures are proposed such as Bayesian nonparametric models (Kocijan et al., 2004; Nguyen-Tuong et al., 2008; Kamthe & Deisenroth, 2018), inverse dynamics model (Pathak et al., 2017; Liu et al., 2022), multistep model (Asadi et al., 2019), and hypernetwork (Huang et al., 2021).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

Feb-2-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - Maryland > Prince George's County
    - College Park (0.04)
  - California > San Mateo County
    - Menlo Park (0.04)
- Asia > China
  - Shanghai > Shanghai (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Government
  - Military (0.67)
  - Regional Government > North America Government
    - United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found