4eb7d41ae6005f60fe401e56277ebd4e-AuthorFeedback.pdf

Neural Information Processing Systems 

Forsupervised learning, [7]1 showed that gradually increasing theentropy5 of the training distribution helped. However in RL, breaking down a task in sub-problems that can be ordered by6 difficulty is non trivial [2]. For video games, [4]adapted the concept with astarting state increasingly further from the end of a9 demonstration. Thus, contrary to[1,3,4],we do not "reverse time" toartificially build asequence oftasks starting further13 from a goal state and subsequently harder to solve in the hope of learning how to reach this goal from all possible14 starting states, but ratherstack new optimization problems on top of previous ones, which gradually increases the15 computational complexityofthetask, inorder tolearn toactoptimally inoptimization problems with anincreasing16 number oflevels. Thus,contrarytomostproblemsinRL,herewearefacedwithatask naturally constitutedofahierarchy20 ofsub-problems ordered by their position inthe Polynomial Hierarchy,which motivates acurriculum.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found