A Unifying Framework of Off-Policy General Value Function Evaluation

Neural Information Processing Systems 

We propose a new algorithm called GenTD for off-policy GVFs evaluation and show that GenTD learns multiple interrelated multidimensional GVFs as efficiently as a single canonical scalar value function.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found