Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers

Open in new window