908c9a564a86426585b29f5335b619bc-AuthorFeedback.pdf

Neural Information Processing Systems 

This approach is often preferable when using "standard" parametric5 regression algorithms, as the weighted p-norm can be directly minimized at learning time (e.g., in least-squares6 regression,the`2-normisminimized). This is not surprising as in the worst case the inherent Bellman error may12 be unbounded and standard AVI tends to diverge. Recent work (Jinglin Chen, Nan Jiang,Information-Theoretic13 Considerations in Batch Reinforcement Learning, ICML 2019, Conjecture 8) has even conjectured an exponential14 lower bound in case of unbounded Bellman error. In the paper we propose afirst heuristic algorithm to automatically construct aset22 ofanchor points (see beginning ofpage 6). Akeybenefit ofourapproach isthat itdoes notmodify theunderlying (linear) feature51 representation, allowing theuser tousethelinear representation with, forexample, approximate value iteration, and52 should this fail, the user can switch toour algorithm and progressively increase the number of support points while53 keeping the same feature representation.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found