Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

Open in new window