Perception Tokens Enhance Visual Reasoning in Multimodal Language Models