Automated Text Scoring in the Age of Generative AI for the GPU-poor

Ormerod, Christopher Michael, Kwako, Alexander

Jul-1-2024–arXiv.org Artificial Intelligence

Generative language models (GLMs), such as GPT-4 [34] and Claude [2], have demonstrated powerful performance across a variety of language and reasoning tasks. In the field of education, researchers are exploring the extent to which these models can perform tasks such as automated essay scoring [56], providing feedback to students [4], individual tutoring [7], and more [15]. Although GLMs show promise in automating certain educative tasks, there are critical limitations that hinder the possibility of wider implementation. For instance, researchers have shown that GLMs can be "jail-broken" to bypass safety guardrails [58] and can disclose personally identifiable information. Large GLMs are extremely large, requiring millions of dollars to train and deploy; as such, they are highly inefficient for specialized tasks [26]. These models are constantly being updated, sometimes leading to degraded performance [6], and they are only accessible via Application Programming Interfaces (APIs), which lead to issues around replicability and leave little room to conduct rigorous research. It is for these reasons that we shift the focus away from large, proprietary GLMs toward smaller, open-source GLMs. In this study, we focus on two educational applications: Automated Text Scoring (ATS) and providing feedback--specifically, feedback that justifies scores based on the scoring rubric. Our study is the first to demonstrate that it is possible to efficiently fine-tune such GLMs to yield high-quality scores, and that (at least some) feedback from fine-tuned models can explain these scores.

glm, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Jul-1-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- North America
  - United States
    - Texas > Travis County
      - Austin (0.04)
    - New Jersey > Bergen County
      - Mahwah (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Spain
  - Aragón (0.04)
  - Catalonia > Barcelona Province
    - Barcelona (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Education
  - Assessment & Standards > Student Performance (0.57)
  - Educational Technology > Educational Software
    - Computer-Aided Assessment (0.35)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.96)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.50)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found