Clustering in Causal Attention Masking