Bridging the LLM Accessibility Divide? Performance, Fairness, and Cost of Closed versus Open LLMs for Automated Essay Scoring

Oketch, Kezia, Lalor, John P., Yang, Yi, Abbasi, Ahmed

Mar-14-2025–arXiv.org Artificial Intelligence

The rapid development of machine learning (ML) technologies, particularly large language models (LLMs), has led to major advancements in natural language processing (NLP, Abbasi et al. 2023). While much of this advancement happened under the umbrella of the common task framework which espouses transparency and openness (Abbasi et al. 2023), in recent years, closed LLMs such as GPT-3 and GPT-4 have set new performance standards in tasks ranging from text generation to question answering, demonstrating unprecedented capabilities in zero-shot and few-shot learning scenarios (Brown et al. 2020, OpenAI 2023). Given the strong performance of closed LLMs such as GPT-4, many studies within the LLM-as-a-judge paradigm rely on their scores as ground truth benchmarks for evaluating both open and closed LLMs (Chiang and Lee 2023), further entrenching the dominance of SOTA closed LLMs (Vergho et al. 2024). Along with closed LLMs, there are also LLMs where the pre-trained models (i.e., training weights) and inference code are publicly available ("open LLMs") such as Llama (Touvron et al. 2023, Dubey et al. 2024) as well as LLMs where the full training data and training code are also available ("open-source LLMs") such as OLMo (Groeneveld et al. 2024). Open and open-source LLMs provide varying levels of transparency for developers and researchers (Liu et al. 2023). Access to model weights, training data, and inference code enables several benefits for the user-developer-researcher community, including lower costs per input/output token through third-party API services, support for local/offline pre-training and fine-tuning, and deeper analysis of model biases and debiasing strategies. However, the dominance of closed LLMs raises a number of concerns, including accessibility and fairness (Strubell et al. 2020, Bender 2021, Irugalbandara et al. 2024).

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Mar-14-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Thailand (0.14)

Genre:
- Overview (0.93)
- Research Report
  - Experimental Study (0.68)
  - New Finding (1.00)

Industry:
- Education
  - Assessment & Standards > Student Performance (0.89)
  - Educational Technology > Educational Software
    - Computer-Aided Assessment (0.83)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.34)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found