Transformers can optimally learn regression mixture models

Open in new window