Simple linear attention language models balance the recall-throughput tradeoff

Open in new window