Transformers Provably Learn Feature-Position Correlations in Masked Image Modeling

Open in new window