Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

Open in new window