Revisiting Bellman Errors for Offline Model Selection

Zitovsky, Joshua P., de Marchi, Daniel, Agarwal, Rishabh, Kosorok, Michael R.

arXiv.org Machine Learning 

Unfortunately, the best policy from a set of many policies such estimates are often inaccurate (Fu et al., 2021). As given only logged data, is crucial for applying an alternative, many works have explored using empirical offline RL in real-world settings. One idea that Bellman errors to perform OMS, but have found them to has been extensively explored is to select policies be poor predictors of value model accuracy (Irpan et al., based on the mean squared Bellman error 2019; Paine et al., 2020). This has led to a belief among (MSBE) of the associated Q-functions. However, many researchers that Bellman errors are not useful for previous work has struggled to obtain adequate OMS (Géron, 2019; Fujimoto et al., 2022). OMS performance with Bellman errors, leading many researchers to abandon the idea. To this end, To this end, we propose a new algorithm, Supervised Bellman we elucidate why previous work has seen pessimistic Validation (SBV), that provides a better proxy for the results with Bellman errors and identify true Bellman errors than empirical Bellman errors. SBV conditions under which OMS algorithms based achieves strong performance on diverse tasks ranging from on Bellman errors will perform well. Moreover, healthcare problems (Klasnja et al., 2015) to Atari games we develop a new estimator of the MSBE that is (Bellemare et al., 2013).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found