information-theoretic task selection
Information-theoretic Task Selection for Meta-Reinforcement Learning
In Meta-Reinforcement Learning (meta-RL) an agent is trained on a set of tasks to prepare for and learn faster in new, unseen, but related tasks. The training tasks are usually hand-crafted to be representative of the expected distribution of target tasks and hence all used in training. We show that given a set of training tasks, learning can be both faster and more effective (leading to better performance in the target tasks), if the training tasks are appropriately selected. We propose a task selection algorithm based on information theory, which optimizes the set of tasks used for training in meta-RL, irrespectively of how they are generated. The algorithm establishes which training tasks are both sufficiently relevant for the target tasks, and different enough from one another. We reproduce different meta-RL experiments from the literature and show that our task selection algorithm improves the final performance in all of them.
Review for NeurIPS paper: Information-theoretic Task Selection for Meta-Reinforcement Learning
Summary and Contributions: [UPDATE] I have read the rebuttal, and I still believe the authors should work on experiment description clarity. I do not dispute that this paper has committed the common sin of saying "We assume the standard meta-RL framework" and moving on. However, I believe three points are in favour of this paper: - The authors' response seems to indicate that the reviewers' message has been heard and more details are going to be included; I would actually prefer they did not clutter the main paper with these details because ... - The meta-RL methodology for these tasks is very well known and "standard" so if they made changes, it's likely that they made the tasks harder, not easier. There are dozens, perhaps more, papers building on this methodology starting from 2017 onwards, many in top tier conferences, and a majority do not describe the tasks in detail in the main paper. I would still like harder domains, but I can't disregard presented evidence (yet).
Review for NeurIPS paper: Information-theoretic Task Selection for Meta-Reinforcement Learning
This paper was quite controversial among the four reviewers, leading to more than 10 pages of discussion (longer than the paper itself!) In the end, two reviewers were advocating for acceptance (R1, R3), one was advocating for rejection (R2), and one was leaning towards rejection (R4). This is a direction that hasn't been studied before, and will likely become quite relevant in settings where the task distribution is quite heterogeneous The experimental results suggest that the algorithm performs very well on a large number of simple domains, when combined with MAML and RL 2. The experiments also include an ablation study. Time complexity is not an issue; the reviewers appreciated the author response here. These are the main reasons that R1 and R3 were advocating for acceptance. I agree that these are strong points, and make me want to accept the paper.
Information-theoretic Task Selection for Meta-Reinforcement Learning
In Meta-Reinforcement Learning (meta-RL) an agent is trained on a set of tasks to prepare for and learn faster in new, unseen, but related tasks. The training tasks are usually hand-crafted to be representative of the expected distribution of target tasks and hence all used in training. We show that given a set of training tasks, learning can be both faster and more effective (leading to better performance in the target tasks), if the training tasks are appropriately selected. We propose a task selection algorithm based on information theory, which optimizes the set of tasks used for training in meta-RL, irrespectively of how they are generated. The algorithm establishes which training tasks are both sufficiently relevant for the target tasks, and different enough from one another.