TouchStone: Evaluating Vision-Language Models by Language Models