Memorization Capacity of Multi-Head Attention in Transformers

Open in new window