Imitation-Projected Programmatic Reinforcement Learning

Abhinav Verma, Hoang Le, Yisong Yue, Swarat Chaudhuri

Neural Information Processing Systems 

However, such a distillation process can yield a highly suboptimal programmatic policy -- i.e., a large

Similar Docs  Excel Report  more

TitleSimilaritySource
None found