AITopics | Yu, Mengxia

Collaborating Authors

Yu, Mengxia

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation

Nguyen, Bang, Du, Tingting, Yu, Mengxia, Angrave, Lawrence, Jiang, Meng

arXiv.org Artificial IntelligenceMar-7-2025

While the Question Generation (QG) task has been increasingly adopted in educational assessments, its evaluation remains limited by approaches that lack a clear connection to the educational values of test items. In this work, we introduce test item analysis, a method frequently used by educators to assess test question quality, into QG evaluation. Specifically, we construct pairs of candidate questions that differ in quality across dimensions such as topic coverage, item difficulty, item discrimination, and distractor efficiency. We then examine whether existing QG evaluation approaches can effectively distinguish these differences. Our findings reveal significant shortcomings in these approaches with respect to accurately assessing test item quality in relation to student performance. To address this gap, we propose a novel QG evaluation framework, QG-SMS, which leverages Large Language Model for Student Modeling and Simulation to perform test item analysis. As demonstrated in our extensive experiments and human evaluation study, the additional perspectives introduced by the simulated student profiles lead to a more effective and robust assessment of test items.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.05888

Country:

Europe (0.93)
North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > UAE (0.14)
(3 more...)

Genre:

Research Report > New Finding (0.34)
Instructional Material > Course Syllabus & Notes (0.31)

Industry:

Education > Educational Technology > Educational Software (0.61)
Education > Assessment & Standards > Student Performance (0.51)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)

Add feedback

The Super Weight in Large Language Models

Yu, Mengxia, Wang, De, Shan, Qi, Reed, Colorado, Wan, Alvin

arXiv.org Artificial IntelligenceNov-11-2024

Recent works have shown a surprising result: a small fraction of Large Language Model (LLM) parameter outliers are disproportionately important to the quality of the model. LLMs contain billions of parameters, so these small fractions, such as 0.01%, translate to hundreds of thousands of parameters. In this work, we present an even more surprising finding: Pruning as few as a single parameter can destroy an LLM's ability to generate text - increasing perplexity by 3 orders of magnitude and reducing zero-shot accuracy to guessing. We propose a data-free method for identifying such parameters, termed super weights, using a single forward pass through the model. We additionally find that these super weights induce correspondingly rare and large activation outliers, termed super activations. When preserved with high precision, super activations can improve simple round-to-nearest quantization to become competitive with state-of-the-art methods. For weight quantization, we similarly find that by preserving the super weight and clipping other weight outliers, round-to-nearest quantization can scale to much larger block sizes than previously considered. Large Language Models (LLMs) have been growing in size and capability at an unprecedented rate, enabling them to capture increasingly complex linguistic patterns across a wide range of tasks. However, with this increase in model scale, new and unexpected behaviors have emerged. Dettmers et al. (2022) discovered that once LLMs reach a certain scale, a small set of hidden state features contains outliers of exceptionally large magnitude.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.07191

Country: Europe (0.46)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Reference-based Metrics Disprove Themselves in Question Generation

Nguyen, Bang, Yu, Mengxia, Huang, Yun, Jiang, Meng

arXiv.org Artificial IntelligenceJun-17-2024

Reference-based metrics such as BLEU and BERTScore are widely used to evaluate question generation (QG). In this study, on QG benchmarks such as SQuAD and HotpotQA, we find that using human-written references cannot guarantee the effectiveness of the reference-based metrics. Most QG benchmarks have only one reference; we replicated the annotation process and collect another reference. A good metric was expected to grade a human-validated question no worse than generated questions. However, the results of reference-based metrics on our newly collected reference disproved the metrics themselves. We propose a reference-free metric consisted of multi-dimensional criteria such as naturalness, answerability, and complexity, utilizing large language models. These criteria are not constrained to the syntactic or semantic of a single reference question, and the metric does not require a diverse set of references. Experiments reveal that our metric accurately distinguishes between high-quality questions and flawed ones, and achieves state-of-the-art alignment with human judgment.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2403.12242

Country:

Asia > Middle East > UAE (0.14)
North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Government > Military (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Pre-training Language Models for Comparative Reasoning

Yu, Mengxia, Zhang, Zhihan, Yu, Wenhao, Jiang, Meng

arXiv.org Artificial IntelligenceNov-28-2023

Comparative reasoning is a process of comparing objects, concepts, or entities to draw conclusions, which constitutes a fundamental cognitive ability. In this paper, we propose a novel framework to pre-train language models for enhancing their abilities of comparative reasoning over texts. While there have been approaches for NLP tasks that require comparative reasoning, they suffer from costly manual data labeling and limited generalizability to different tasks. Our approach introduces a novel method of collecting scalable data for text-based entity comparison, which leverages both structured and unstructured data. Moreover, we present a framework of pre-training language models via three novel objectives on comparative reasoning. Evaluation on downstream tasks including comparative question answering, question generation, and summarization shows that our pre-training framework significantly improves the comparative reasoning abilities of language models, especially under low-resource conditions. This work also releases the first integrated benchmark for comparative reasoning.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2305.14457

Country:

Europe (1.00)
North America > United States > Colorado (0.14)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

A Survey of Multi-task Learning in Natural Language Processing: Regarding Task Relatedness and Training Methods

Zhang, Zhihan, Yu, Wenhao, Yu, Mengxia, Guo, Zhichun, Jiang, Meng

arXiv.org Artificial IntelligenceFeb-14-2023

By focusing on one such two "how to share" categories into task, the model ignores knowledge from the training five categories, including feature learning approach, signals of related tasks (Ruder, 2017). There low-rank approach, task clustering approach, task are a great number of tasks in NLP, from syntax relation learning approach, and decomposition approach; parsing to information extraction, from machine Crawshaw (2020) presented more recent translation to question answering: each requires models in both single-domain and multi-modal architectures, a model dedicated to learning from data. Biologically, as well as an overview of optimization humans learn natural languages, from basic methods in MTL. Nevertheless, it is still not clearly grammar to complex semantics in a single brain understood how to design and train a single model (Hashimoto et al., 2017). In the field of machine to handle a variety of NLP tasks according to task learning, multi-task learning (MTL) aims to leverage relatedness. Especially when faced with a set of useful information shared across multiple related tasks that are seldom simultaneously trained previously, tasks to improve the generalization performance it is of crucial importance that researchers on all tasks (Caruana, 1997). In deep neural find proper auxiliary tasks and assess the feasibility networks, it is generally achieved by sharing part of of such multi-task learning attempt.

machine learning, natural language, question answering, (20 more...)

arXiv.org Artificial Intelligence

2204.03508

Country: North America > United States (0.46)

Genre: Overview (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.68)
(3 more...)

Add feedback

Validating Label Consistency in NER Data Annotation

Zeng, Qingkai, Yu, Mengxia, Yu, Wenhao, Jiang, Tianwen, Weninger, Tim, Jiang, Meng

arXiv.org Artificial IntelligenceJan-21-2021

Data annotation plays a crucial role in ensuring your named entity recognition (NER) projects are trained with the right information to learn from. Producing the most accurate labels is a challenge due to the complexity involved with annotation. Label inconsistency between multiple subsets of data annotation (e.g., training set and test set, or multiple training subsets) is an indicator of label mistakes. In this work, we present an empirical method to explore the relationship between label (in-)consistency and NER model performance. It can be used to validate the label consistency (or catches the inconsistency) in multiple sets of NER data annotation. In experiments, our method identified the label inconsistency of test data in SCIERC and CoNLL03 datasets (with 26.7% and 5.4% label mistakes). It validated the consistency in the corrected version of both datasets.

deep learning, neural network, subset, (22 more...)

arXiv.org Artificial Intelligence

2101.08698

Country: Asia > China (0.29)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback