Share Your Attention: Transformer Weight Sharing via Matrix-based Dictionary Learning

Open in new window