Simple linear attention language models balance the recall-throughput tradeoff