Review for NeurIPS paper: Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks

Neural Information Processing Systems 

I think the transfer distance can be interpreted as a measure of transferability, and the transfer distance defined in the paper seems to suggest that transfer learning is possible only when W_S and W_T are close to each other under the \Sigma_T norm. I understand that this definition is motivated from the proposition 1, but it is not always the case how people apply transfer learning in practice. In over-parametrized neural networks, two very different weights could both generate good performance model, but some learned features mappings can still be transferred to various tasks. Thus, I believe the transfer distance defined here does not fully characterize the transferability people discussed in general. Since the lower bound is not just characterizing the rate of the convergence, I would like to see the phase transition behavior of the bound between different regimes, and discontinuity would suggest that the lower bound is not tight at these points.