Gemini Pro Defeated by GPT-4V: Evidence from Education

Lee, Gyeong-Geon, Latif, Ehsan, Shi, Lehong, Zhai, Xiaoming

Dec-26-2023–arXiv.org Artificial Intelligence

This study compared the classification performance of Gemini Pro and GPT-4V in educational settings. Employing visual question answering (VQA) techniques, the study examined both models' abilities to read text-based rubrics and then automatically score student-drawn models in science education. We employed both quantitative and qualitative analyses using a dataset derived from student-drawn scientific models and employing NERIF (Notation-Enhanced Rubrics for Image Feedback) prompting methods. The findings reveal that GPT-4V significantly outperforms Gemini Pro in terms of scoring accuracy and Quadratic Weighted Kappa. The qualitative analysis reveals that the differences may be due to the models' ability to process fine-grained texts in images and overall image classification performance. Even adapting the NERIF approach by further de-sizing the input images, Gemini Pro seems not able to perform as well as GPT-4V. The findings suggest GPT-4V's superior capability in handling complex multimodal educational tasks. The study concludes that while both models represent advancements in AI, GPT-4V's higher performance makes it a more suitable tool for educational applications involving multimodal data interpretation.

arxiv preprint arxiv, gemini, gpt-4v, (14 more...)

arXiv.org Artificial Intelligence

Dec-26-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Georgia > Clarke County > Athens (0.14)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine (1.00)
- Education
  - Educational Setting (0.66)
  - Curriculum > Subject-Specific Education (0.34)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language
      - Large Language Model (1.00)
      - Chatbot (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)