Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

Open in new window