On Layer Normalization in the Transformer Architecture

Open in new window