908c9a564a86426585b29f5335b619bc-AuthorFeedback.pdf

Feb-12-2026, 22:38:38 GMT–Neural Information Processing Systems

This approach is often preferable when using "standard" parametric5 regression algorithms, as the weighted p-norm can be directly minimized at learning time (e.g., in least-squares6 regression,the`2-normisminimized). This is not surprising as in the worst case the inherent Bellman error may12 be unbounded and standard AVI tends to diverge. Recent work (Jinglin Chen, Nan Jiang,Information-Theoretic13 Considerations in Batch Reinforcement Learning, ICML 2019, Conjecture 8) has even conjectured an exponential14 lower bound in case of unbounded Bellman error. In the paper we propose afirst heuristic algorithm to automatically construct aset22 ofanchor points (see beginning ofpage 6). Akeybenefit ofourapproach isthat itdoes notmodify theunderlying (linear) feature51 representation, allowing theuser tousethelinear representation with, forexample, approximate value iteration, and52 should this fail, the user can switch toour algorithm and progressively increase the number of support points while53 keeping the same feature representation.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

Neural Information Processing Systems

Feb-12-2026, 22:38:38 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Duplicate Docs Excel Report

Title
908c9a564a86426585b29f5335b619bc-AuthorFeedback.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found