Improving Automatic VQA Evaluation Using Large Language Models