03b264c595403666634ac75d828439bc-AuthorFeedback.pdf
–Neural Information Processing Systems
For both algorithms,Mt represents aset of models which are19 consistent with the experience gathered so far, i.e. have low error on the current replay buffer. Both versions of the algorithm are based on two key ideas: i) computing exploration policies designed to induce25 disagreement between plausiblemodels ii)doing sointernally,without havingtointeract withtheenvironment. We will add this in the updated paper.
Neural Information Processing Systems
Feb-11-2026, 08:28:31 GMT