Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation

Neural Information Processing Systems 

In reinforcement learning (RL), an agent interacts with a stochastic environment in order to maximize the total reward [Sutton and Barto, 2018].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found