VIRT: Vision Instructed Transformer for Robotic Manipulation