Emergence of meta-stable clustering in mean-field transformer models

Open in new window