Goto

Collaborating Authors

 Reinforcement Learning


Whento Askfor Help: Proactive Interventionsin Autonomous Reinforcement Learning

Neural Information Processing Systems

Wheneverthe 6 Figure 4:Asubsetof Irreversible (left) andoffofthemiddle). Asafe estimates 7 ourevaluationtasks: Tabletop Manipulation, Peg Insertion, and Half-Cheetah Velocity.






6af779991368999ab3da0d366c208fba-Paper-Conference.pdf

Neural Information Processing Systems

Planning enables autonomous agents to solve complex decision-making problems by evaluating predictions of the future. However, classical planning algorithms often become infeasible in real-world settings where state spaces are high-dimensional andtransitiondynamicsunknown.



Self-ImitationLearningviaGeneralizedLower BoundQ-learning

Neural Information Processing Systems

NaiveIS estimator involves products of the form ฯ€(at | xt)/ยต(at | xt) and is infeasible in practice due to high variance. To control the variance, a line of prior work has focused on operator-based estimation to avoid fullIS products, which reduces the estimation procedure into repeated iterations of off-policyevaluation operators [1-3].