Improving Knowledge Distillation in Transfer Learning with Layer-wise Learning Rates