Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?