Review for NeurIPS paper: Modular Meta-Learning with Shrinkage

Neural Information Processing Systems 

Weaknesses: A. Major concerns 1. Can you comment on the choice of a Normal prior for your shrinkage variants, as opposed to a sparsity inducing prior, such as a Laplace or a Spike and Slab prior? Sparsity inducing priors would probably be more intuitive for a better modularity (where some layers would require no adaptation at all, as opposed to a small adaptation). The experiments do show that the sigma version of the different algorithms learn different scales of adaptation. However there is no experiment showing the benefit of these approaches for some of the aspects that motivated this approach (interpretability, causality, transfer learning or domain adaptation), beyond the standard performance in the few-shot learning setting. B. Moderate concerns 1. Lines 27-28: "As data increases, these hard-coded modules may become a bottleneck for further improvement.". In all the experiments of this paper, we are in the few-shot learning setting.