CLIP4MC: An RL-Friendly Vision-Language Model for Minecraft