Where do Large Vision-Language Models Look at when Answering Questions?