The emergence of sparse attention: impact of data distribution and benefits of repetition

Open in new window