Strassen Attention, Split VC Dimension and Compositionality in Transformers

Open in new window