Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers Lirui Wang 1 Xinlei Chen