AITopics | examinee

Collaborating Authors

examinee

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AClosed-Form Solution for Fast and Reliable Adaptive Testing

Neural Information Processing SystemsJun-19-2026, 15:25:10 GMT

Human ability estimation is essential for educational assessment, career advancement, and professional certification. Adaptive Testing systems can improve estimation efficiency by selecting fewer, targeted questions, and are widely used in exams, e.g., GRE, GMAT, and Duolingo English Test. However, selecting an optimal subset of questions remains a challenging nested optimization problem. Existing methods rely on costly approximations or data-intensive training, making them unsuitable for today's large-scale and complex testing environments. Thus, we propose a Closed-Form solution for question subset selection in Adaptive Testing. It directly minimizes ability estimation error by reducing ability parameter's gradient bias while maintaining Hessian stability, which enables a simple greedy algorithm for question selection. Moreover, it can quantify the impact of human behavioral perturbations on ability estimation. Extensive experiments on large-scale educational datasets demonstrate that it reduces the number of required questions by 10% compared to SOTA methods, while maintaining the same estimation accuracy.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > China (0.28)
North America > United States > New York (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(3 more...)

Add feedback

Evaluating LLM-Generated Q&A Test: a Student-Centered Study

Wróblewska, Anna, Grabek, Bartosz, Świstak, Jakub, Dan, Daniel

arXiv.org Artificial IntelligenceAug-8-2025

This research prepares an automatic pipeline for generating reliable question-answer (Q&A) tests using AI chatbots. We automatically generated a GPT-4o-mini-based Q&A test for a Natural Language Processing course and evaluated its psychometric and perceived-quality metrics with students and experts. A mixed-format IRT analysis showed that the generated items exhibit strong discrimination and appropriate difficulty, while student and expert star ratings reflect high overall quality. A uniform DIF check identified two items for review. These findings demonstrate that LLM-generated assessments can match human-authored tests in psychometric performance and user satisfaction, illustrating a scalable approach to AI-assisted assessment development.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-98417-4_20

2505.06591

Country: Europe > Austria (0.28)

Genre:

Research Report > New Finding (0.89)
Research Report > Experimental Study (0.69)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Enhancing Security and Strengthening Defenses in Automated Short-Answer Grading Systems

Yarmohammadtoosky, Sahar, Zhou, Yiyun, Yaneva, Victoria, Baldwin, Peter, Rezayi, Saed, Clauser, Brian, Harikeo, Polina

arXiv.org Artificial IntelligenceMay-2-2025

This study examines vulnerabilities in transformer-based automated short-answer grading systems used in medical education, with a focus on how these systems can be manipulated through adversarial gaming strategies. Our research identifies three main types of gaming strategies that exploit the system's weaknesses, potentially leading to false positives. To counteract these vulnerabilities, we implement several adversarial training methods designed to enhance the systems' robustness. Our results indicate that these methods significantly reduce the susceptibility of grading systems to such manipulations, especially when combined with ensemble techniques like majority voting and ridge regression, which further improve the system's defense against sophisticated adversarial inputs. Additionally, employing large language models such as GPT-4 with varied prompting techniques has shown promise in recognizing and scoring gaming strategies effectively. The findings underscore the importance of continuous improvements in AI-driven educational tools to ensure their reliability and fairness in high-stakes settings.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2505.00061

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Education > Assessment & Standards > Student Performance (0.81)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Finding Words Associated with DIF: Predicting Differential Item Functioning using LLMs and Explainable AI

Maeda, Hotaka, Lu, Yikai

arXiv.org Artificial IntelligenceFeb-10-2025

We fine-tuned and compared several encoder-based Transformer large language models (LLM) to predict differential item functioning (DIF) from the item text. We then applied explainable artificial intelligence (XAI) methods to these models to identify specific words associated with DIF. The data included 42,180 items designed for English language arts and mathematics summative state assessments among students in grades 3 to 11. Prediction $R^2$ ranged from .04 to .32 among eight focal and reference group pairs. Our findings suggest that many words associated with DIF reflect minor sub-domains included in the test blueprint by design, rather than construct-irrelevant item content that should be removed from assessments. This may explain why qualitative reviews of DIF items often yield confusing or inconclusive results. Our approach can be used to screen words associated with DIF during the item-writing process for immediate revision, or help review traditional DIF analysis results by highlighting key words in the text. Extensions of this research can enhance the fairness of assessment programs, especially those that lack resources to build high-quality items, and among smaller subpopulations where we do not have sufficient sample sizes for traditional DIF analyses.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.07017

Country: North America > United States > California > Santa Cruz County > Santa Cruz (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LLMzSz{\L}: a comprehensive LLM benchmark for Polish

Jassem, Krzysztof, Ciesiółka, Michał, Graliński, Filip, Jabłoński, Piotr, Pokrywka, Jakub, Kubis, Marek, Jabłońska, Monika, Staruch, Ryszard

arXiv.org Artificial IntelligenceJan-4-2025

This article introduces the first comprehensive benchmark for the Polish language at this scale: LLMzSz{\L} (LLMs Behind the School Desk). It is based on a coherent collection of Polish national exams, including both academic and professional tests extracted from the archives of the Polish Central Examination Board. It covers 4 types of exams, coming from 154 domains. Altogether, it consists of almost 19k closed-ended questions. We investigate the performance of open-source multilingual, English, and Polish LLMs to verify LLMs' abilities to transfer knowledge between languages. Also, the correlation between LLMs and humans at model accuracy and exam pass rate levels is examined. We show that multilingual LLMs can obtain superior results over monolingual ones; however, monolingual models may be beneficial when model size matters. Our analysis highlights the potential of LLMs in assisting with exam validation, particularly in identifying anomalies or errors in examination tasks.

exam, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.02266

Country: Europe (0.28)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > K-12 Education (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Diffusion-Inspired Cold Start with Sufficient Prior in Computerized Adaptive Testing

Ma, Haiping, Xia, Aoqing, Wang, Changqian, Wang, Hai, Zhang, Xingyi

arXiv.org Artificial IntelligenceNov-18-2024

Computerized Adaptive Testing (CAT) aims to select the most appropriate questions based on the examinee's ability and is widely used in online education. However, existing CAT systems often lack initial understanding of the examinee's ability, requiring random probing questions. This can lead to poorly matched questions, extending the test duration and negatively impacting the examinee's mindset, a phenomenon referred to as the Cold Start with Insufficient Prior (CSIP) task. This issue occurs because CAT systems do not effectively utilize the abundant prior information about the examinee available from other courses on online platforms. These response records, due to the commonality of cognitive states across different knowledge domains, can provide valuable prior information for the target domain. However, no prior work has explored solutions for the CSIP task. In response to this gap, we propose Diffusion Cognitive States TransfeR Framework (DCSR), a novel domain transfer framework based on Diffusion Models (DMs) to address the CSIP task. Specifically, we construct a cognitive state transition bridge between domains, guided by the common cognitive states of examinees, encouraging the model to reconstruct the initial ability state in the target domain. To enrich the expressive power of the generated data, we analyze the causal relationships in the generation process from a causal perspective. Redundant and extraneous cognitive states can lead to limited transfer and negative transfer effects. Our DCSR can seamlessly apply the generated initial ability states in the target domain to existing question selection algorithms, thus improving the cold start performance of the CAT system. Extensive experiments conducted on five real-world datasets demonstrate that DCSR significantly outperforms existing baseline methods in addressing the CSIP task.

artificial intelligence, examinee, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.12182

Country:

North America > Canada > Ontario > Toronto (0.05)
Asia > China > Anhui Province > Hefei (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Jiangsu Province > Yancheng (0.04)

Genre: Research Report (0.50)

Industry:

Education > Educational Setting > Online (0.86)
Education > Educational Technology > Educational Software > Computer Based Training (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Implicit assessment of language learning during practice as accurate as explicit testing

Hou, Jue, Katinskaia, Anisia, Vu, Anh-Duc, Yangarber, Roman

arXiv.org Artificial IntelligenceSep-24-2024

Assessment of proficiency of the learner is an essential part of Intelligent Tutoring Systems (ITS). We use Item Response Theory (IRT) in computer-aided language learning for assessment of student ability in two contexts: in test sessions, and in exercises during practice sessions. Exhaustive testing across a wide range of skills can provide a detailed picture of proficiency, but may be undesirable for a number of reasons. Therefore, we first aim to replace exhaustive tests with efficient but accurate adaptive tests. We use learner data collected from exhaustive tests under imperfect conditions, to train an IRT model to guide adaptive tests. Simulations and experiments with real learner data confirm that this approach is efficient and accurate. Second, we explore whether we can accurately estimate learner ability directly from the context of practice with exercises, without testing. We transform learner data collected from exercise sessions into a form that can be used for IRT modeling. This is done by linking the exercises to {\em linguistic constructs}; the constructs are then treated as "items" within IRT. We present results from large-scale studies with thousands of learners. Using teacher assessments of student ability as "ground truth," we compare the estimates obtained from tests vs. those from exercises. The experiments confirm that the IRT models can produce accurate ability estimation based on exercises.

artificial intelligence, learner, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2409.16133

Country:

Europe > Finland > Uusimaa > Helsinki (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry:

Education > Curriculum > Subject-Specific Education (0.85)
Education > Educational Technology > Educational Software > Computer Based Training (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Survey of Models for Cognitive Diagnosis: New Developments and Future Directions

Wang, Fei, Gao, Weibo, Liu, Qi, Li, Jiatong, Zhao, Guanhao, Zhang, Zheng, Huang, Zhenya, Zhu, Mengxiao, Wang, Shijin, Tong, Wei, Chen, Enhong

arXiv.org Artificial IntelligenceJul-7-2024

Cognitive diagnosis has been developed for decades as an effective measurement tool to evaluate human cognitive status such as ability level and knowledge mastery. It has been applied to a wide range of fields including education, sport, psychological diagnosis, etc. By providing better awareness of cognitive status, it can serve as the basis for personalized services such as well-designed medical treatment, teaching strategy and vocational training. This paper aims to provide a survey of current models for cognitive diagnosis, with more attention on new developments using machine learning-based methods. By comparing the model structures, parameter estimation algorithms, model evaluation methods and applications, we provide a relatively comprehensive review of the recent trends in cognitive diagnosis models. Further, we discuss future directions that are worthy of exploration. In addition, we release two Python libraries: EduData for easy access to some relevant public datasets we have collected, and EduCDM that implements popular CDMs to facilitate both applications and research purposes.

cognitive diagnosis, diagnosis, examinee, (14 more...)

arXiv.org Artificial Intelligence

2407.05458

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(8 more...)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > K-12 Education (1.00)
Education > Educational Setting > Online (0.93)
(5 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Add feedback

Survey of Computerized Adaptive Testing: A Machine Learning Perspective

Liu, Qi, Zhuang, Yan, Bi, Haoyang, Huang, Zhenya, Huang, Weizhe, Li, Jiatong, Yu, Junhao, Liu, Zirui, Hu, Zirui, Hong, Yuting, Pardos, Zachary A., Ma, Haiping, Zhu, Mengxiao, Wang, Shijin, Chen, Enhong

arXiv.org Artificial IntelligenceApr-4-2024

Computerized Adaptive Testing (CAT) provides an efficient and tailored method for assessing the proficiency of examinees, by dynamically adjusting test questions based on their performance. Widely adopted across diverse fields like education, healthcare, sports, and sociology, CAT has revolutionized testing practices. While traditional methods rely on psychometrics and statistics, the increasing complexity of large-scale testing has spurred the integration of machine learning techniques. This paper aims to provide a machine learning-focused survey on CAT, presenting a fresh perspective on this adaptive testing method. By examining the test question selection algorithm at the heart of CAT's adaptivity, we shed light on its functionality. Furthermore, we delve into cognitive diagnosis models, question bank construction, and test control within CAT, exploring how machine learning can optimize these components. Through an analysis of current methods, strengths, limitations, and challenges, we strive to develop robust, fair, and efficient CAT systems. By bridging psychometric-driven CAT research with machine learning, this survey advocates for a more inclusive and interdisciplinary approach to the future of adaptive testing.

computerized adaptive testing, examinee, proficiency, (12 more...)

arXiv.org Artificial Intelligence

2404.00712

Country:

Europe > Spain (0.14)
Asia > China (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)
(8 more...)

Genre:

Overview (1.00)
Questionnaire & Opinion Survey (0.92)
Research Report (0.81)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Government > Regional Government (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)
(5 more...)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(6 more...)

Add feedback

Efficiently Measuring the Cognitive Ability of LLMs: An Adaptive Testing Perspective

Zhuang, Yan, Liu, Qi, Ning, Yuting, Huang, Weizhe, Lv, Rui, Huang, Zhenya, Zhao, Guanhao, Zhang, Zheng, Mao, Qingyang, Wang, Shijin, Chen, Enhong

arXiv.org Artificial IntelligenceOct-28-2023

Large language models (LLMs), like ChatGPT, have shown some human-like cognitive abilities. For comparing these abilities of different models, several benchmarks (i.e. sets of standard test questions) from different fields (e.g., Literature, Biology and Psychology) are often adopted and the test results under traditional metrics such as accuracy, recall and F1, are reported. However, such way for evaluating LLMs can be inefficient and inaccurate from the cognitive science perspective. Inspired by Computerized Adaptive Testing (CAT) used in psychometrics, we propose an adaptive testing framework for LLM evaluation. Rather than using a standard test set and simply reporting accuracy, this approach dynamically adjusts the characteristics of the test questions, such as difficulty, based on the model's performance. This allows for a more accurate estimation of the model's abilities, using fewer questions. More importantly, it allows LLMs to be compared with humans easily, which is essential for NLP models that aim for human-level ability. Our diagnostic reports have found that ChatGPT often behaves like a ``careless student'', prone to slip and occasionally guessing the questions. We conduct a fine-grained diagnosis and rank the latest 6 instruction-tuned LLMs from three aspects of Subject Knowledge, Mathematical Reasoning, and Programming, where GPT4 can outperform other models significantly and reach the cognitive ability of middle-level students. Different tests for different models using efficient adaptive testing -- we believe this has the potential to become a new norm in evaluating large language models.

chatgpt, llm, student, (14 more...)

arXiv.org Artificial Intelligence

2306.10512

Country:

Asia > China (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry:

Education > Educational Setting > Online (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.81)
Health & Medicine > Therapeutic Area > Neurology (0.81)
Education > Educational Technology > Educational Software > Computer Based Training (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Add feedback