GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values

Open in new window