Goto

Collaborating Authors

 Transfer Learning


Review for NeurIPS paper: Modular Meta-Learning with Shrinkage

Neural Information Processing Systems

Weaknesses: A. Major concerns 1. Can you comment on the choice of a Normal prior for your shrinkage variants, as opposed to a sparsity inducing prior, such as a Laplace or a Spike and Slab prior? Sparsity inducing priors would probably be more intuitive for a better modularity (where some layers would require no adaptation at all, as opposed to a small adaptation). The experiments do show that the sigma version of the different algorithms learn different scales of adaptation. However there is no experiment showing the benefit of these approaches for some of the aspects that motivated this approach (interpretability, causality, transfer learning or domain adaptation), beyond the standard performance in the few-shot learning setting. B. Moderate concerns 1. Lines 27-28: "As data increases, these hard-coded modules may become a bottleneck for further improvement.". In all the experiments of this paper, we are in the few-shot learning setting.


Review for NeurIPS paper: Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks

Neural Information Processing Systems

I think the transfer distance can be interpreted as a measure of transferability, and the transfer distance defined in the paper seems to suggest that transfer learning is possible only when W_S and W_T are close to each other under the \Sigma_T norm. I understand that this definition is motivated from the proposition 1, but it is not always the case how people apply transfer learning in practice. In over-parametrized neural networks, two very different weights could both generate good performance model, but some learned features mappings can still be transferred to various tasks. Thus, I believe the transfer distance defined here does not fully characterize the transferability people discussed in general. Since the lower bound is not just characterizing the rate of the convergence, I would like to see the phase transition behavior of the bound between different regimes, and discontinuity would suggest that the lower bound is not tight at these points.


Review for NeurIPS paper: Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks

Neural Information Processing Systems

This paper addresses the problem of inductive transfer with one-hidden-layer neural networks or linear models and proposes minimax lower bounds for these models. Three reviewers and AC agree that it is a well written paper which studies an important problem. The proposed fine-grained minimax rate for transfer learning is a nice contribution to this field. Although the setting is somewhat simple, this work is inspiring for studying inductive transfer with neural networks. There are still some minor concerns on the organization of the paper and the evaluation of the proposed lower bound, which should be fully addressed in the camera-ready version.


Review for NeurIPS paper: A Combinatorial Perspective on Transfer Learning

Neural Information Processing Systems

Additional Feedback: My major concern is that the authors have only applied their method to variants of MNIST. While the experiments performed are indeed from the established Continual Learning benchmarks in prior work, they do not suffice to showcase the true complexity of the continual learning challenge. I would strongly recommend doing at least some RL experiments, for instance, as performed in Online EWC paper. Secondly, as mentioned above the descriptions of GGM, FMN and NCTL are quite terse to understand and need to be re-read a couple times to make sense of them. I'd recommend simplifying these descriptions for an easier flow and deferring the details to an appendix.


Review for NeurIPS paper: A Combinatorial Perspective on Transfer Learning

Neural Information Processing Systems

This paper studies continual learning that does not require task boundary and identity information and proposes a novel model ensemble method from the combinatorial perspective for this problem. All reviewers and AC agree that this paper builds a novel and promising direction. Authors also design delicate algorithm by introducing the non-stationary learning techniques to solve this problem. The experimental results of this method are somewhat weak in several aspects, but given the challenge of online continual learning in nature, they are fairly convincing to justify the main ideas and proposed methods. Note that after rebuttal and discussion phases, there still remain several major concerns: First, the empirical evaluation is not realistic in terms of task diversity and scalability.



Review for NeurIPS paper: What is being transferred in transfer learning?

Neural Information Processing Systems

This paper provides experimental results and analyses from multiple perspectives for revealing what enables a successful transfer and which part of the network is responsible for that. Reviewers and AC unanimously agree that this paper is well written, proposes new tools for understanding transfer learning and provides novel and important insights. The rebuttal addresses most of the concerns raised by the reviewers. After rebuttal, a reviewer still concerns about the practical value of the understanding since it does not imply a real technique to promote transfer performance. The paper is recommended for acceptance.


Tackling Small Sample Survival Analysis via Transfer Learning: A Study of Colorectal Cancer Prognosis

arXiv.org Artificial Intelligence

Survival prognosis is crucial for medical informatics. Practitioners often confront small-sized clinical data, especially cancer patient cases, which can be insufficient to induce useful patterns for survival predictions. This study deals with small sample survival analysis by leveraging transfer learning, a useful machine learning technique that can enhance the target analysis with related knowledge pre-learned from other data. We propose and develop various transfer learning methods designed for common survival models. For parametric models such as DeepSurv, Cox-CC (Cox-based neural networks), and DeepHit (end-to-end deep learning model), we apply standard transfer learning techniques like pretraining and fine-tuning. For non-parametric models such as Random Survival Forest, we propose a new transfer survival forest (TSF) model that transfers tree structures from source tasks and fine-tunes them with target data. We evaluated the transfer learning methods on colorectal cancer (CRC) prognosis. The source data are 27,379 SEER CRC stage I patients, and the target data are 728 CRC stage I patients from the West China Hospital. When enhanced by transfer learning, Cox-CC's $C^{td}$ value was boosted from 0.7868 to 0.8111, DeepHit's from 0.8085 to 0.8135, DeepSurv's from 0.7722 to 0.8043, and RSF's from 0.7940 to 0.8297 (the highest performance). All models trained with data as small as 50 demonstrated even more significant improvement. Conclusions: Therefore, the current survival models used for cancer prognosis can be enhanced and improved by properly designed transfer learning techniques. The source code used in this study is available at https://github.com/YonghaoZhao722/TSF.


Reviews: Learning to learn by gradient descent by gradient descent

Neural Information Processing Systems

Comments (mostly on related work): Authors: "The idea of using learning to learn or meta-learning to acquire knowledge or inductive biases has a long history [Thrun and Pratt, 1998]." But the intro to this reference is muddling the waters by confusing meta-learning (which is about learning the learning algorithm itself) and transfer learning, subsuming basically everything under "meta-learning," even standard back-propagation, because it can be applied to some data set, and then may learn new data points more quickly (so this is just standard transfer learning). To my knowledge, the first work on learning general learning algorithms written in a universal programming language was published in 1987: J. Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: The meta-meta-... hook. Authors: "This work was built on by [Younger et al., 2001, Hochreiter et al., 2001] wherein a higher-level network act as a gradient descent procedure, with both levels trained during learning. Alternatively Schmidhuber [1992, 1993] considers networks that are able to modify their own behavior and act as an alternative to recurrent networks in meta-learning. Note, however that these earlier works do not directly address the transfer of a learned training procedure to novel problem instances and instead focus on adaptivity in the online setting."


Reviews: Learning Bound for Parameter Transfer Learning

Neural Information Processing Systems

The parameter transfer learning framework described in the paper is very interesting and deserves attention. The approach taken by the authors (describe in Section 2) is sound but lacks clarity. The notation is well chosen, but not always properly explained (see my "Specific comments" below). Also, as the transfer learning framework is very similar to domain adaptation, which is studied in many papers (as Ben-David et al. 2007 cited by the authors), it would be interesting to discuss the connection of Theorem 1 with existing domain adaptation results. Section 3 is difficult to follow for a reader not familiar with sparse coding (like myself).