Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies