cloze question
Difficulty-Controllable Cloze Question Distractor Generation
Kang, Seokhoon, Jeon, Yejin, Hwang, Seonjeong, Lee, Gary Geunbae
Multiple-choice cloze questions are commonly used to assess linguistic proficiency and comprehension. However, generating high-quality distractors remains challenging, as existing methods often lack adaptability and control over difficulty levels, and the absence of difficulty-annotated datasets further hinders progress. To address these issues, we propose a novel framework for generating distractors with controllable difficulty by leveraging both data augmentation and a multitask learning strategy. First, to create a high-quality, difficulty-annotated dataset, we introduce a two-way distractor generation process in order to produce diverse and plausible distractors. These candidates are subsequently refined through filtering and then categorized by difficulty using an ensemble QA system. Second, this newly created dataset is leveraged to train a difficulty-controllable generation model via multitask learning. The framework includes carefully designed auxiliary tasks that enhance the model's semantic understanding of distractors and its ability to estimate their difficulty. Experimental results demonstrate that our method generates high-quality distractors across difficulty levels and substantially outperforms GPT-4o in aligning distractor difficulty with human perception.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England (0.04)
- (10 more...)
Constructing Cloze Questions Generatively
We present a generative method called CQG for constructing cloze questions from a given article using neural networks and WordNet, with an emphasis on generating multigram distractors. Built on sense disambiguation, text-to-text transformation, WordNet's synset taxonomies and lexical labels, CQG selects an answer key for a given sentence, segments it into a sequence of instances, generates instance-level distractor candidates (IDCs) using a transformer and sibling synsets.It then removes inappropriate IDCs, ranks the remaining IDCs based on contextual embedding similarities, as well as synset and lexical relatedness, forms distractor candidates by combinatorially replacing instances with the corresponding top-ranked IDCs, and checks if they are legitimate phrases. Finally, it selects top-ranked distractor candidates based on contextual semantic similarities to the answer key. Experiments show that this method significantly outperforms SOTA results. Human judges also confirm the high qualities of the generated distractors.
- North America > United States > Massachusetts > Middlesex County > Lowell (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (2 more...)
- Health & Medicine (1.00)
- Education > Curriculum > Subject-Specific Education (0.68)
UniDM: A Unified Framework for Data Manipulation with Large Language Models
Qian, Yichen, He, Yongyi, Zhu, Rong, Huang, Jintao, Ma, Zhijian, Wang, Haibin, Wang, Yaohua, Sun, Xiuyu, Lian, Defu, Ding, Bolin, Zhou, Jingren
Designing effective data manipulation methods is a long standing problem in data lakes. Traditional methods, which rely on rules or machine learning models, require extensive human efforts on training data collection and tuning models. Recent methods apply Large Language Models (LLMs) to resolve multiple data manipulation tasks. They exhibit bright benefits in terms of performance but still require customized designs to fit each specific task. This is very costly and can not catch up with the requirements of big data lake platforms. In this paper, inspired by the cross-task generality of LLMs on NLP tasks, we pave the first step to design an automatic and general solution to tackle with data manipulation tasks. We propose UniDM, a unified framework which establishes a new paradigm to process data manipulation tasks using LLMs. UniDM formalizes a number of data manipulation tasks in a unified form and abstracts three main general steps to solve each task. We develop an automatic context retrieval to allow the LLMs to retrieve data from data lakes, potentially containing evidence and factual information. For each step, we design effective prompts to guide LLMs to produce high quality results. By our comprehensive evaluation on a variety of benchmarks, our UniDM exhibits great generality and state-of-the-art performance on a wide variety of data manipulation tasks.
- Europe > Switzerland (0.05)
- Oceania > Australia (0.05)
- Europe > Denmark > Capital Region > Copenhagen (0.05)
- (29 more...)
- Workflow (0.68)
- Research Report > New Finding (0.46)
A Theory for Emergence of Complex Skills in Language Models
Arora, Sanjeev, Goyal, Anirudh
A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems difficult. The current paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework. Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks. (b) Mathematical analysis showing that the Scaling Laws imply a strong form of inductive bias that allows the pre-trained model to learn very efficiently. We informally call this {\em slingshot generalization} since naively viewed it appears to give competence levels at skills that violate usual generalization theory. (c) A key example of slingshot generalization, that competence at executing tasks involving $k$-tuples of skills emerges essentially at the same scaling and same rate as competence on the elementary skills themselves.
- North America > United States > Massachusetts (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension
Lovenia, Holy, Wilie, Bryan, Chung, Willy, Zeng, Min, Cahyawijaya, Samuel, Dan, Su, Fung, Pascale
Task-adaptive pre-training (TAPT) alleviates the lack of labelled data and provides performance lift by adapting unlabelled data to downstream task. Unfortunately, existing adaptations mainly involve deterministic rules that cannot generalize well. Here, we propose Clozer, a sequence-tagging based cloze answer extraction method used in TAPT that is extendable for adaptation on any cloze-style machine reading comprehension (MRC) downstream tasks. We experiment on multiple-choice cloze-style MRC tasks, and show that Clozer performs significantly better compared to the oracle and state-of-the-art in escalating TAPT effectiveness in lifting model performance, and prove that Clozer is able to recognize the gold answers independently of any heuristics.
- Asia > China > Hong Kong (0.05)
- Europe > United Kingdom > Wales (0.04)
- Europe > United Kingdom > Scotland > City of Aberdeen > Aberdeen (0.04)
- (2 more...)
All NLP Tasks Are Generation Tasks: A General Pretraining Framework
Du, Zhengxiao, Qian, Yujie, Liu, Xiao, Ding, Ming, Qiu, Jiezhong, Yang, Zhilin, Tang, Jie
There have been various types of pretraining architectures including autoregressive models (e.g., GPT), autoencoding models (e.g., BERT), and encoder-decoder models (e.g., T5). On the other hand, NLP tasks are different in nature, with three main categories being classification, unconditional generation, and conditional generation. However, none of the pretraining frameworks performs the best for all tasks, which introduces inconvenience for model development and selection. We propose a novel pretraining framework GLM (General Language Model) to address this challenge. Compared to previous work, our architecture has three major benefits: (1) it performs well on classification, unconditional generation, and conditional generation tasks with one single pretrained model; (2) it outperforms BERT-like models on classification due to improved pretrain-finetune consistency; (3) it naturally handles variable-length blank filling which is crucial for many downstream tasks. Empirically, GLM substantially outperforms BERT on the SuperGLUE natural language understanding benchmark with the same amount of pre-training data. Moreover, GLM with 1.25x parameters of BERT-Large achieves the best performance in NLU, conditional and unconditional generation at the same time, which demonstrates its generalizability to different downstream tasks.
- North America > United States > Wyoming (0.05)
- North America > United States > Michigan (0.05)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- (6 more...)
- Leisure & Entertainment > Sports > Football (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Media > Television (0.67)
Can This Tiny Language Model Defeat Gigantic GPT3?
While GPT-3 has been bragging about achieving state-of-the-art performance on Complex NLP tasks with hundred billion parameters, researchers from the LMU Munich, Germany have proposed a language model who can show similar achievements with way fewer parameters. GPT-3 has been trained on 175 billion parameters and thus showed remarkable few-shot abilities, and by reformulating a few tasks and prompting inputs, it also showed immense capabilities on SuperGLUE benchmark. However it comes with two most significant drawbacks -- large models aren't always feasible for real-world scenarios, and with the context window of these monstrous models is limited to a few hundred tokens, it doesn't scale more than a few examples. And thus, the researchers proposed an alternative to priming, i.e. PET required unlabelled data, which is easier to gather than labelled data, thus making it usable for real-world applications.
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
Schick, Timo, Schütze, Hinrich
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance on challenging natural language understanding benchmarks. In this work, we show that performance similar to GPT-3 can be obtained with language models whose parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain some form of task description, combined with gradient-based optimization; additionally exploiting unlabeled data gives further improvements. Based on our findings, we identify several key factors required for successful natural language understanding with small language models.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > China > Hong Kong (0.04)
Using AI-generated questions to train NLP systems
A recent approach to the popular extractive question answering (extractive QA) task that generates its own training data instead of requiring existing annotated question answering examples. Extractive QA is a popular task for natural language processing (NLP) research, where models must extract a short snippet from a document in order to answer a natural language question. Though supervised models perform well at extractive QA, they require thousands -- sometimes hundreds of thousands -- of annotated examples for training, and their performance suffers when tested outside of the textual domains and language they were trained on. By approaching extractive QA as a self-supervised task, our technique outperformed early supervised models on the widely used SQuAD data set while requiring no annotated question answering training data. The code for our method is now available to download.
Unsupervised Question Answering by Cloze Translation
Lewis, Patrick, Denoyer, Ludovic, Riedel, Sebastian
Obtaining training data for Question Answering (QA) is time-consuming and resource-intensive, and existing QA datasets are only available for limited domains and languages. In this work, we explore to what extent high quality training data is actually required for Extractive QA, and investigate the possibility of unsupervised Extractive QA. We approach this problem by first learning to generate context, question and answer triples in an unsupervised manner, which we then use to synthesize Extractive QA training data automatically. To generate such triples, we first sample random context paragraphs from a large corpus of documents and then random noun phrases or named entity mentions from these paragraphs as answers. Next we convert answers in context to "fill-in-the-blank" cloze questions and finally translate them into natural questions. We propose and compare various unsupervised ways to perform cloze-to-natural question translation, including training an unsupervised NMT model using non-aligned corpora of natural questions and cloze questions as well as a rule-based approach. We find that modern QA models can learn to answer human questions surprisingly well using only synthetic training data. We demonstrate that, without using the SQuAD training data at all, our approach achieves 56.4 F1 on SQuAD v1 (64.5 F1 when the answer is a Named entity mention), outperforming early supervised models.
- North America > United States (1.00)
- Europe (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.88)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.87)