SupplementaryMaterials AProofofTheorem2: AsymptoticConvergenceofRobustQ-Learning

Feb-8-2026, 06:47:18 GMT–Neural Information Processing Systems

From[BorkarandMeyn,2000],weknowthatthestochastic approximation (18) converges to the fixed point ofT, i.e., Q . Finally, to show Theorem 3, we only need to show each term in(56) is smaller than . In this section we develop the finite-time analysis of the robust TDC algorithm. We note that recently there are several works [Srikant and Ying, 2019, Xu and Liang, 2021, Kaledin et al., 2020] on finite-time analysis of RL algorithms that do not need theprojection. Specifically, the problem in [Srikant and Ying, 2019] is for one time scalelinear stochastic approximation.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Feb-8-2026, 06:47:18 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.34)

Duplicate Docs Excel Report

Title
3a4496776767aaa99f9804d0905fe584-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found