Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models

Open in new window