Goto

Collaborating Authors

 maml


Supplementary Material of Towards Enabling Meta-Learning from Target Models

Neural Information Processing Systems

This is the supplementary material of paper "Towards Enabling Meta-Learning from Target Models". We give implementation details, more discussions, and more experiment results in this material.



Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks

Neural Information Processing Systems

In this paper, we study the generalization properties of Model-Agnostic MetaLearning (MAML) algorithms for supervised learning problems. We focus on the setting in which we train the MAML model over mtasks, each with ndata points, and characterize its generalization error from two points of view: First, we assume the new task at test time is one of the training tasks, and we show that, for strongly convex objective functions, the expected excess population loss is bounded by O(1/mn). Second, we consider the MAML algorithm's generalization to an unseen task and show that the resulting generalization error depends on the total variation distance between the underlying distributions of the new task and the tasks observed during the training process. Our proof techniques rely on the connections between algorithmic stability and generalization bounds of algorithms. In particular, we propose a new definition of stability for meta-learning algorithms, which allows us to capture the role of both the number of tasks mand number of samples per task non the generalization error of MAML.


Learning where to learn: Gradient sparsity in meta and continual learning

Neural Information Processing Systems

Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to learn a weight initialization such that a small number of weight changes results in low generalization error. We show that this form of meta-learning can be improved by letting the learning algorithm decide which weights to change, i.e., by learning where to learn. We find that patterned sparsity emerges from this process, with the pattern of sparsity varying on a problem-byproblem basis.






General Response

Neural Information Processing Systems

We thank all the reviewers for their insightful and encouraging comments. Per your suggestion, we will update the appendix by adding more explanations about the proof ideas. Similarly, we can extend convergence results in Theorem 4 in Appendix from FS to IFS. We expect that the approach developed in this paper will fuel this future investigation. We will update this into the revision.