Review for NeurIPS paper: Continual Learning of Control Primitives : Skill Discovery via Reset-Games
–Neural Information Processing Systems
Additional Feedback: Line-by-line comments: Line 129 - Seem to be missing a comma after the \ldots . Line 177 - One theoretical point that seems hidden or ignored in this work is what this expectation for J {forward} (and J {reset}) really means. Because of the iterative and continuous "reset, forward, reset, forward, ..." nature of the task, this expectation is being (implicitly) taken after some arbitrary number of iterations between resets and forward episodes. This is perhaps fine if the initial states converge to some non-degenerate stationary distribution but this ignores the, very real, possibility of there being inescapable terminal states. E.g. if the reset policy always eventually throws the robot into a hole then the stationary distribution will always have the robot in this hole and thus nothing can be learned.
Neural Information Processing Systems
Jan-23-2025, 08:53:02 GMT
- Technology: