86e8f7ab32cfd12577bc2619bc635690-Reviews.html

Mar-13-2024, 18:07:50 GMT–Neural Information Processing Systems

This paper proposes to use random projections as a proxy to learn BEBFs (Bellman Error Basis Functions). Given a (high dimensional) set of features and the currently estimated value function, the features are (randomly) projected on a smaller space, and the temporal differences errors (related to the currently estimated value function) are regressed on these projected features. The (scalar) regressed function is then added to the set of features used to estimate the value function. A finite sample analysis is conducted, the main result showing that if the Bellman residual is linear in the (high dimensional) features, then the Bellman error can be well regressed on the compressed space (depending notably on the size of this space and on the number of samples). The authors also use this result to provide some guarantee on the estimated value function.

computational cost, experiment, value function, (6 more...)

Neural Information Processing Systems

Mar-13-2024, 18:07:50 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.37)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.39)