Evaluating Gemini in an arena for learning

LearnLM Team, null, Modi, Abhinit, Veerubhotla, Aditya Srikanth, Rysbek, Aliya, Huber, Andrea, Anand, Ankit, Bhoopchand, Avishkar, Wiltshire, Brett, Gillick, Daniel, Kasenberg, Daniel, Sgouritsa, Eleni, Elidan, Gal, Liu, Hengrui, Winnemoeller, Holger, Jurenka, Irina, Cohan, James, She, Jennifer, Wilkowski, Julia, Alarakyia, Kaiz, McKee, Kevin R., Singh, Komal, Wang, Lisa, Kunesch, Markus, Pîslar, Miruna, Efron, Niv, Mahmoudieh, Parsa, Kamienny, Pierre-Alexandre, Wiltberger, Sara, Mohamed, Shakir, Agarwal, Shashank, Phal, Shubham Milind, Lee, Sun Jae, Strinopoulos, Theofilos, Ko, Wei-Jen, Gold-Zamir, Yael, Haramaty, Yael, Assael, Yannis

Jun-2-2025–arXiv.org Artificial Intelligence

Artificial intelligence (AI) is poised to transform education, but the research community lacks a robust, general benchmark to evaluate AI models for learning. To assess state-of-the-art support for educational use cases, we ran an "arena for learning" where educators and pedagogy experts conduct blind, head-to-head, multi-turn comparisons of leading AI models. In particular, $N = 189$ educators drew from their experience to role-play realistic learning use cases, interacting with two models sequentially, after which $N = 206$ experts judged which model better supported the user's learning goals. The arena evaluated a slate of state-of-the-art models: Gemini 2.5 Pro, Claude 3.7 Sonnet, GPT-4o, and OpenAI o3. Excluding ties, experts preferred Gemini 2.5 Pro in 73.2% of these match-ups -- ranking it first overall in the arena. Gemini 2.5 Pro also demonstrated markedly higher performance across key principles of good pedagogy. Altogether, these results position Gemini 2.5 Pro as a leading model for learning.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Jun-2-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Industry:
- Government (0.68)
- Education
  - Educational Setting (0.93)
  - Educational Technology > Educational Software (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found