Goto

Collaborating Authors

 Sha, Lele


Annotation Guidelines-Based Knowledge Augmentation: Towards Enhancing Large Language Models for Educational Text Classification

arXiv.org Artificial Intelligence

Various machine learning approaches have gained significant popularity for the automated classification of educational text to identify indicators of learning engagement -- i.e. learning engagement classification (LEC). LEC can offer comprehensive insights into human learning processes, attracting significant interest from diverse research communities, including Natural Language Processing (NLP), Learning Analytics, and Educational Data Mining. Recently, Large Language Models (LLMs), such as ChatGPT, have demonstrated remarkable performance in various NLP tasks. However, their comprehensive evaluation and improvement approaches in LEC tasks have not been thoroughly investigated. In this study, we propose the Annotation Guidelines-based Knowledge Augmentation (AGKA) approach to improve LLMs. AGKA employs GPT 4.0 to retrieve label definition knowledge from annotation guidelines, and then applies the random under-sampler to select a few typical examples. Subsequently, we conduct a systematic evaluation benchmark of LEC, which includes six LEC datasets covering behavior classification (question and urgency level), emotion classification (binary and epistemic emotion), and cognition classification (opinion and cognitive presence). The study results demonstrate that AGKA can enhance non-fine-tuned LLMs, particularly GPT 4.0 and Llama 3 70B. GPT 4.0 with AGKA few-shot outperforms full-shot fine-tuned models such as BERT and RoBERTa on simple binary classification datasets. However, GPT 4.0 lags in multi-class tasks that require a deep understanding of complex semantic information. Notably, Llama 3 70B with AGKA is a promising combination based on open-source LLM, because its performance is on par with closed-source GPT 4.0 with AGKA. In addition, LLMs struggle to distinguish between labels with similar names in multi-class classification.


Detecting AI-Generated Sentences in Human-AI Collaborative Hybrid Texts: Challenges, Strategies, and Insights

arXiv.org Artificial Intelligence

This study explores the challenge of sentence-level AI-generated text detection within human-AI collaborative hybrid texts. Existing studies of AI-generated text detection for hybrid texts often rely on synthetic datasets. These typically involve hybrid texts with a limited number of boundaries. We contend that studies of detecting AI-generated content within hybrid texts should cover different types of hybrid texts generated in realistic settings to better inform real-world applications. Therefore, our study utilizes the CoAuthor dataset, which includes diverse, realistic hybrid texts generated through the collaboration between human writers and an intelligent writing system in multi-turn interactions. We adopt a two-step, segmentation-based pipeline: (i) detect segments within a given hybrid text where each segment contains sentences of consistent authorship, and (ii) classify the authorship of each identified segment. Our empirical findings highlight (1) detecting AI-generated sentences in hybrid texts is overall a challenging task because (1.1) human writers' selecting and even editing AI-generated sentences based on personal preferences adds difficulty in identifying the authorship of segments; (1.2) the frequent change of authorship between neighboring sentences within the hybrid text creates difficulties for segment detectors in identifying authorship-consistent segments; (1.3) the short length of text segments within hybrid texts provides limited stylistic cues for reliable authorship determination; (2) before embarking on the detection process, it is beneficial to assess the average length of segments within the hybrid text. This assessment aids in deciding whether (2.1) to employ a text segmentation-based strategy for hybrid texts with longer segments, or (2.2) to adopt a direct sentence-by-sentence classification strategy for those with shorter segments.


Towards Automatic Boundary Detection for Human-AI Collaborative Hybrid Essay in Education

arXiv.org Artificial Intelligence

The recent large language models (LLMs), e.g., ChatGPT, have been able to generate human-like and fluent responses when provided with specific instructions. While admitting the convenience brought by technological advancement, educators also have concerns that students might leverage LLMs to complete their writing assignments and pass them off as their original work. Although many AI content detection studies have been conducted as a result of such concerns, most of these prior studies modeled AI content detection as a classification problem, assuming that a text is either entirely human-written or entirely AI-generated. In this study, we investigated AI content detection in a rarely explored yet realistic setting where the text to be detected is collaboratively written by human and generative LLMs (i.e., hybrid text). We first formalized the detection task as identifying the transition points between human-written content and AI-generated content from a given hybrid text (boundary detection). Then we proposed a two-step approach where we (1) separated AI-generated content from human-written content during the encoder training process; and (2) calculated the distances between every two adjacent prototypes and assumed that the boundaries exist between the two adjacent prototypes that have the furthest distance from each other. Through extensive experiments, we observed the following main findings: (1) the proposed approach consistently outperformed the baseline methods across different experiment settings; (2) the encoder training process can significantly boost the performance of the proposed approach; (3) when detecting boundaries for single-boundary hybrid essays, the proposed approach could be enhanced by adopting a relatively large prototype size, leading to a 22% improvement in the In-Domain evaluation and an 18% improvement in the Out-of-Domain evaluation.


Practical and Ethical Challenges of Large Language Models in Education: A Systematic Scoping Review

arXiv.org Artificial Intelligence

Advancements in generative artificial intelligence (AI) and large language models (LLMs) have fueled the development of many educational technology innovations that aim to automate the often time-consuming and laborious tasks of generating and analysing textual content (e.g., generating open-ended questions and analysing student feedback survey) (Kasneci et al., 2023; Wollny et al., 2021; Leiker et al., 2023). LLMs are generative artificial intelligence models that have been trained on an extensive amount of text data, capable of generating human-like text content based on natural language inputs. Specifically, these LLMs, such as Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2018) and Generative Pre-trained Transformer (GPT) (Brown et al., 2020), utilise deep learning and self-attention mechanisms (Vaswani et al., 2017) to selectively attend to the different parts of input texts, depending on the focus of the current tasks, allowing the model to learn complex patterns and relationships among textual contents, such as their semantic, contextual, and syntactic relationships (Min et al., 2021; Liu et al., 2023). As several LLMs (e.g., GPT-3 and Codex) have been pre-trained on massive amounts of data across multiple disciplines, they are capable of completing natural language processing tasks with little (few-shot learning) or no additional training (zero-shot learning) (Brown et al., 2020; Wu et al., 2023). This could lower the technological barriers to LLMs-based innovations as researchers and practitioners can develop new educational technologies by fine-tuning LLMs on specific educational tasks without starting from scratch (Caines et al., 2023; Sridhar et al., 2023). The recent release of ChatGPT, an LLMs-based generative AI chatbot that requires only natural language prompts without additional model training or fine-tuning (OpenAI, 2023), has further lowered the barrier for individuals without technological background to leverage the generative powers of LLMs. Although educational research that leverages LLMs to develop technological innovations for automating educational tasks is yet to achieve its full potential (i.e., most works have focused on improving model performances (Kurdi et al., 2020; Ramesh and Sanampudi, 2022)), a growing body of literature hints at how different stakeholders could potentially benefit from such innovations.