86e8f7ab32cfd12577bc2619bc635690-Reviews.html
–Neural Information Processing Systems
This paper proposes to use random projections as a proxy to learn BEBFs (Bellman Error Basis Functions). Given a (high dimensional) set of features and the currently estimated value function, the features are (randomly) projected on a smaller space, and the temporal differences errors (related to the currently estimated value function) are regressed on these projected features. The (scalar) regressed function is then added to the set of features used to estimate the value function. A finite sample analysis is conducted, the main result showing that if the Bellman residual is linear in the (high dimensional) features, then the Bellman error can be well regressed on the compressed space (depending notably on the size of this space and on the number of samples). The authors also use this result to provide some guarantee on the estimated value function.
Neural Information Processing Systems
Mar-13-2024, 18:07:50 GMT