TRINS: Towards Multimodal Language Models that Can Read