Teachable Reinforcement Learningvia Advice Distillation

Neural Information Processing Systems 

Colorsdesignatesupervision used: shadesofblue = highleveladvice; red = lowleveladvice; black = oracledemonstrations; gray = shaped rewards. Figure 6: "Bestadvice" is OffsetAdvice.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found