Gemini Pro Defeated by GPT-4V: Evidence from Education
Lee, Gyeong-Geon, Latif, Ehsan, Shi, Lehong, Zhai, Xiaoming
–arXiv.org Artificial Intelligence
This study compared the classification performance of Gemini Pro and GPT-4V in educational settings. Employing visual question answering (VQA) techniques, the study examined both models' abilities to read text-based rubrics and then automatically score student-drawn models in science education. We employed both quantitative and qualitative analyses using a dataset derived from student-drawn scientific models and employing NERIF (Notation-Enhanced Rubrics for Image Feedback) prompting methods. The findings reveal that GPT-4V significantly outperforms Gemini Pro in terms of scoring accuracy and Quadratic Weighted Kappa. The qualitative analysis reveals that the differences may be due to the models' ability to process fine-grained texts in images and overall image classification performance. Even adapting the NERIF approach by further de-sizing the input images, Gemini Pro seems not able to perform as well as GPT-4V. The findings suggest GPT-4V's superior capability in handling complex multimodal educational tasks. The study concludes that while both models represent advancements in AI, GPT-4V's higher performance makes it a more suitable tool for educational applications involving multimodal data interpretation.
arXiv.org Artificial Intelligence
Dec-26-2023
- Country:
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- North America > United States
- Georgia > Clarke County > Athens (0.14)
- Europe > United Kingdom
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Education
- Curriculum > Subject-Specific Education (0.34)
- Educational Setting (0.66)
- Health & Medicine (1.00)
- Education
- Technology: