Appendix

Neural Information Processing Systems 

According to Alg. 2, in each exploration, at least one leaf node will be expanded. Moreover, the overall size of the belief tree isO((|A|min(Pδmax,Nmax))D), where Nmax is the maximum sample size given by KLD-Sampling,Pδmax = supb,aPδ(Yb,a), and Yb,a is the set of reachable beliefs after executing actiona at belief b. The tree size is limited sinceNmax is finite. The weights are normalized, i.e., There exist bounded functionsα and α0 such that V (b) = R α(s)b(s)ds, and V (b0) = R α0(s)b0(s)ds. Wecan bound the first and third terms, respectively,byλinlight ofthe assumptions.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found