Goto

Collaborating Authors

 generalized hindsight



Generalized Hindsight for Reinforcement Learning

Neural Information Processing Systems

One of the key reasons for the high sample complexity in reinforcement learning (RL) is the inability to transfer knowledge from one task to another. In standard multi-task RL settings, low-reward data collected while trying to solve one task provides little to no signal for solving that particular task and is hence effectively wasted. However, we argue that this data, which is uninformative for one task, is likely a rich source of information for other tasks. To leverage this insight and efficiently reuse data, we present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks. Intuitively, given a behavior generated under one task, Generalized Hindsight returns a different task that the behavior is better suited for. Then, the behavior is relabeled with this new task before being used by an off-policy RL optimizer. Compared to standard relabeling techniques, Generalized Hindsight provides a substantially more efficient re-use of samples, which we empirically demonstrate on a suite of multi-task navigation and manipulation tasks.


Generalized Hindsight for Reinforcement Learning

Neural Information Processing Systems

Intuitively, given a behavior generated under one task, Generalized Hindsight returns a different task that the behavior is better suited for. Then, the behavior is relabeled with this new task before being used by an off-policy RL optimizer.


Review for NeurIPS paper: Generalized Hindsight for Reinforcement Learning

Neural Information Processing Systems

Weaknesses: - The main weakness of the paper in my opinion is the lack of theoretical rigor to justify some of the claims as well as the language that is often imprecise. For example: - The description of the method in line 55-56 is misleading in that it indicates that the original trajectory with the originally intended task is not used and it is relabeled instead. Later in the paper, in Section 3 and in the algorithm box, the authors explain that they use the original task as well as the relabeled one. In the extreme case, we could imagine a situation where there is a set of successful trajectories for one task (that was potentially collected with another task in mind). In this case, the authors' algorithm would always pick the successful trajectories even though we know that informative negatives are crucial for off-policy RL algorithms.


Review for NeurIPS paper: Generalized Hindsight for Reinforcement Learning

Neural Information Processing Systems

Reviewers were unanimously positive about this manuscript's clarity and contribution, and while acknowledging its shortcomings, all felt there was at least a weak case for acceptance. R1 & R2 were positive about the author rebuttal and I'd encourage the authors to incorporate their addressing of reviewers' concerns into the camera ready.


Generalized Hindsight for Reinforcement Learning

Neural Information Processing Systems

One of the key reasons for the high sample complexity in reinforcement learning (RL) is the inability to transfer knowledge from one task to another. In standard multi-task RL settings, low-reward data collected while trying to solve one task provides little to no signal for solving that particular task and is hence effectively wasted. However, we argue that this data, which is uninformative for one task, is likely a rich source of information for other tasks. To leverage this insight and efficiently reuse data, we present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks. Intuitively, given a behavior generated under one task, Generalized Hindsight returns a different task that the behavior is better suited for. Then, the behavior is relabeled with this new task before being used by an off-policy RL optimizer.


Generalized Hindsight for Reinforcement Learning

arXiv.org Artificial Intelligence

One of the key reasons for the high sample complexity in reinforcement learning (RL) is the inability to transfer knowledge from one task to another. In standard multi-task RL settings, low-reward data collected while trying to solve one task provides little to no signal for solving that particular task and is hence effectively wasted. However, we argue that this data, which is uninformative for one task, is likely a rich source of information for other tasks. To leverage this insight and efficiently reuse data, we present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks. Intuitively, given a behavior generated under one task, Generalized Hindsight returns a different task that the behavior is better suited for. Then, the behavior is relabeled with this new task before being used by an off-policy RL optimizer. Compared to standard relabeling techniques, Generalized Hindsight provides a substantially more efficient reuse of samples, which we empirically demonstrate on a suite of multi-task navigation and manipulation tasks. Videos and code can be accessed here: https://sites.google.com/view/generalized-hindsight.