Transformers versus the EM Algorithm in Multi-class Clustering