d (s,πo(s))=Ro(s) +γdGo(s), max

Neural Information Processing Systems 

Notice we now haveBo(σo,A,d) = σoB(Ad 1) and Be(σe,A,d) = σeB(Ad Ad 1). CaseI:πo(s) argmaxaQπod (s,a).Then, the second event in(22)isan empty set and we have that {πBCTSd (s) / argmax As seen, for Space-Invaders, the correction improves convergence in all testeddepths. Wecompare the standard update method with the update based on the propagated value from the tree nodes, as proposedin[14].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found