Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models