Review for NeurIPS paper: Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks
–Neural Information Processing Systems
I think the transfer distance can be interpreted as a measure of transferability, and the transfer distance defined in the paper seems to suggest that transfer learning is possible only when W_S and W_T are close to each other under the \Sigma_T norm. I understand that this definition is motivated from the proposition 1, but it is not always the case how people apply transfer learning in practice. In over-parametrized neural networks, two very different weights could both generate good performance model, but some learned features mappings can still be transferred to various tasks. Thus, I believe the transfer distance defined here does not fully characterize the transferability people discussed in general. Since the lower bound is not just characterizing the rate of the convergence, I would like to see the phase transition behavior of the bound between different regimes, and discontinuity would suggest that the lower bound is not tight at these points.
Neural Information Processing Systems
Jan-21-2025, 23:21:39 GMT
- Technology: