Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

Open in new window