related work

Apr-25-2026, 18:37:41 GMT–Neural Information Processing Systems

Deterministic RLDeterministic system is often the starting case in the study of sample-efficient algorithms, where the issue of exploration and exploitation trade-off is more clearly revealed since both the transition kernel and reward function are deterministic. The seminal work [81] proposes a sample-efficient algorithm for Q-learning that works for a family of function classes. Recently, [21] studies the agnostic setting where the optimal Q-function can only be approximated by a function class with approximation error. The algorithm in [21] learns the optimal policy with the number of trajectories linear with the eluder dimension. Consider MDPM where the transition is deterministic. Assume the function class in Definition 3.1 satisfies Assumption 2.1 and Assumption 2.2. For any t (0,1), if d Ω(log(BW/λ))and n d poly(κ,k,λ,BW,Bϕ,H,log(d/t)), then with probability at least 1 tAlgorithm 1 returns the optimal policy π .

artificial intelligence, machine learning, probability, (17 more...)

Neural Information Processing Systems

Apr-25-2026, 18:37:41 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
4b4edc2630fe75800ddc29a7b4070add-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found