Symmetric Dot-Product Attention for Efficient Training of BERT Language Models

Open in new window