fe709c654eac84d5239d1a12a4f71877-Reviews.html
–Neural Information Processing Systems
The main idea is to sample several determinations of the system in the form of roll-out trees where each state/action pair has only one sampled successor. A combination of breadth-first and best-first search is used to explore the deterministic trees, and then they are recombined to create a stochastic model from which a policy can be calculated. The algorithm is proven to be consistent (as the number of trees and number of nodes in each tree both approach infinity, the value at the root can be arbitrarily approximated with high probability). The algorithm is empirically compared to an planning algorithm that requires a full transition model and performs well in comparison.
Neural Information Processing Systems
Mar-14-2024, 01:08:14 GMT
- Technology: