Does Self-Attention Need Separate Weights in Transformers?

Open in new window