Transformers Provably Learn Feature-Position Correlations in Masked Image Modeling