Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples
Tengyu Xu, Shaofeng Zou, Yingbin Liang
–Neural Information Processing Systems
Neural Information Processing Systems
Feb-14-2026, 21:05:28 GMT
- Country:
- North America
- Canada > Alberta (0.14)
- United States > Ohio (0.04)
- North America
- Genre:
- Research Report > New Finding (0.47)
- Technology: