OntheConvergenceofSmoothRegularized ApproximateValueIterationSchemes

Neural Information Processing Systems 

In practical settings, the reinforcement learning (RL) algorithms are faced with a challenge of maximizing the cumulative reward given a finite sample of environment transitions and inexact representation ofpolicyandvaluefunction. This givesrisetoerrors thatpropagateacross learning iterations and, combined, can result in divergence. Recently, state-of-the-art RL algorithms have been successful in solving complex environments and, hence, overcoming inaccuracies and their accumulation.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found