03b264c595403666634ac75d828439bc-AuthorFeedback.pdf

Neural Information Processing Systems 

For both algorithms,Mt represents aset of models which are19 consistent with the experience gathered so far, i.e. have low error on the current replay buffer. Both versions of the algorithm are based on two key ideas: i) computing exploration policies designed to induce25 disagreement between plausiblemodels ii)doing sointernally,without havingtointeract withtheenvironment. We will add this in the updated paper.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found