Transformer-based Causal Language Models Perform Clustering