AITopics | test taker

2511.16842

Country:

North America > United States > California > Ventura County > Thousand Oaks (0.14)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Education > Educational Setting (0.68)
Health & Medicine > Therapeutic Area (0.67)
Information Technology > Security & Privacy (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceSep-11-2025

Automatic Detection of Inauthentic Templated Responses in English Language Assessments

Samant, Yashad, Becker, Lee, Hellman, Scott, Behan, Bradley, Hughes, Sarah, Southerland, Joshua

Pearson Education, Inc. Author Note Correspondence concerning this article should be addressed to Lee Becker. Pearson affiliated authors can be reached at .@pearson.com. Sarah Hughes can be reached at sarah.hughes1@pearson.com. Joshua Southerland can be reached at josh.southerland@pearson.com In this study, we introduce the automated detection of inauthentic, templated responses (AuDITR) task, describe a machine learning-based approach to this task and illustrate the importance of regularly updating these models in production. Introduction English language proficiency (ELP) tests carry exceptionally high stakes because of how they influence access to employment, education and national residency status.

machine learning, natural language, template, (9 more...)

2509.08355

Genre: Research Report > New Finding (0.34)

Industry:

Education > Curriculum > Subject-Specific Education (0.34)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Fokin, Danil, Płużyczka, Monika, Golovin, Grigory

The Polish Vocabulary Size Test: A Novel Adaptive Test for Receptive Vocabulary Assessment

arXiv.org Artificial IntelligenceJul-29-2025

We present the Polish Vocabulary Size Test (PVST), a novel tool for assessing the receptive vocabulary size of both native and non-native Polish speakers. Based on Item Response Theory and Computerized Adaptive Testing, PVST dynamically adjusts to each test-taker's proficiency level, ensuring high accuracy while keeping the test duration short. To validate the test, a pilot study was conducted with 1.475 participants. Native Polish speakers demonstrated significantly larger vocabularies compared to non-native speakers. For native speakers, vocabulary size showed a strong positive correlation with age. The PVST is available online at myvocab.info/pl.

artificial intelligence, natural language, vocabulary size, (15 more...)

doi: 10.3758/s13428-025-02775-3

2507.19869

Country:

Europe > Germany > Rheinland-Pfalz > Mainz (0.05)
Europe > Poland > Masovia Province > Warsaw (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Education > Curriculum > Subject-Specific Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Communications (0.68)

Säuberli, Andreas, Frassinelli, Diego, Plank, Barbara

Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?

arXiv.org Artificial IntelligenceJun-12-2025

Knowing how test takers answer items in educational assessments is essential for test development, to evaluate item quality, and to improve test validity. However, this process usually requires extensive pilot studies with human participants. If large language models (LLMs) exhibit human-like response behavior to test items, this could open up the possibility of using them as pilot participants to accelerate test development. In this paper, we evaluate the human-likeness or psychometric plausibility of responses from 18 instruction-tuned LLMs with two publicly available datasets of multiple-choice test items across three subjects: reading, U.S. history, and economics. Our methodology builds on two theoretical frameworks from psychometrics which are commonly used in educational assessment, classical test theory and item response theory. The results show that while larger models are excessively confident, their response distributions can be more human-like when calibrated with temperature scaling. In addition, we find that LLMs tend to correlate better with humans in reading comprehension items compared to other subjects. However, the correlations are not very strong overall, indicating that LLMs should not be used for piloting educational assessments in a zero-shot setting.

large language model, natural language, response distribution, (16 more...)

2506.09796

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(4 more...)

Genre: Research Report > New Finding (0.88)

Industry: Education > Assessment & Standards > Student Performance (0.35)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Los Angeles TimesMay-7-2025, 22:44:25 GMT

After exam fiasco, California State Bar faces deeper financial crisis

The California State Bar's botched roll out of a new exam -- a move that the cash-strapped agency made in the hopes of saving money -- could ultimately end up costing it an additional 5.6 million. Leah T. Wilson, executive director of the State Bar, told state lawmakers at a Senate Judiciary hearing Tuesday that the agency expects to pay around 3 million to offer free exams to test takers, an additional 2 million to book in-person testing sites in July, and 620,000 to return the test to its traditional system of multiple-choice questions in July. Wilson, who announced last week she will step down when her term ends this summer, revealed the costs during a 90-minute hearing called by Sen. Thomas J. Umberg (D-Orange), chair of the Senate Judiciary Committee, to find out what went so "spectacularly wrong." Chaos ensued in February when thousands of test takers seeking to practice law in California sat for the new exam. Some reported they couldn't log into the exam because online testing platforms repeatedly crashed.

artificial intelligence, exam, state bar, (15 more...)

Country: North America > United States > California > Los Angeles County > Los Angeles (0.05)

Industry:

Law > Government & the Courts (0.70)
Government > Regional Government > North America Government > United States Government (0.50)

Technology: Information Technology > Artificial Intelligence (1.00)

Los Angeles TimesMay-2-2025, 23:34:52 GMT

Head of State Bar of California to step down after exam fiasco

The State Bar of California announced Friday that its embattled leader, who has faced growing pressure to resign over the botched February roll out of a new bar exam, will step down in July. Leah T. Wilson, the agency's executive director, informed the Board of Trustees she will not seek another term in the position she has held on and off since 2017. She also apologized for her role in the February bar exam chaos. "Accountability is a bedrock principle for any leader," Wilson said in a statement. "At the end of the day, I am responsible for everything that occurs within the organization. Despite our best intentions, the experiences of applicants for the February Bar Exam simply were unacceptable, and I fully recognize the frustration and stress this experience caused. While there are no words to assuage those emotions, I do sincerely apologize."

artificial intelligence, state bar, wilson, (15 more...)

Country: North America > United States > California (0.68)

Industry:

Law > Government & the Courts (0.54)
Government > Regional Government > North America Government > United States Government (0.32)

Technology: Information Technology > Artificial Intelligence (1.00)

Los Angeles TimesApr-26-2025, 21:53:09 GMT

Pressure grows on State Bar of California to revert to national exam format in July after botched exam

An influential California legislator is pressuring the State Bar of California to ditch its new multiple-choice questions after a February bar exam debacle and revert to the traditional test format in July. "Given the catastrophe of the February bar, I think that going back to the methods that have been used for the last 50 years -- until we can adequately test what new methods may be employed -- is the appropriate way to go," Sen. Tom Umberg (D-Orange), chair of the state Senate Judiciary Committee, told The Times. Thousands of test takers seeking to practice law in California typically take the two-day bar exam in July. Reverting to the national system by the National Conference of Bar Examiners, which California has used since 1972, would be a major retreat for the embattled State Bar. Its new exam was rolled out this year as a cost-cutting measure and "historic agreement" that would offer test takers the choice of remote testing.

artificial intelligence, exam, state bar, (15 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.05)

Industry:

Law > Government & the Courts (0.73)
Government > Regional Government > North America Government > United States Government (0.73)
Education > Educational Setting > Higher Education (0.50)

Technology: Information Technology > Artificial Intelligence > Applied AI (0.31)

Los Angeles TimesApr-23-2025, 10:00:53 GMT

State Bar of California admits it used AI to develop exam questions, triggering new furor

Nearly two months after hundreds of prospective California lawyers complained that their bar exams were plagued with technical problems and irregularities, the state's legal licensing body has caused fresh outrage by admitting that some multiple-choice questions were developed with the aid of artificial intelligence. The State Bar of California said in a news release Monday that it will ask the California Supreme Court to adjust test scores for those who took its February bar exam. But it declined to acknowledge significant problems with its multiple-choice questions -- even as it revealed that a subset of questions were recycled from a first-year law student exam, while others were developed with the assistance of AI by ACS Ventures, the State Bar's independent psychometrician. "The debacle that was the February 2025 bar exam is worse than we imagined," said Mary Basick, assistant dean of academic skills at UC Irvine Law School. Having the questions drafted by non-lawyers using ...

artificial intelligence, exam, state bar, (11 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.05)

Genre: Press Release (0.35)

Industry:

Law > Government & the Courts (0.73)
Education > Educational Setting > Higher Education (0.70)
Education > Curriculum > Subject-Specific Education (0.70)
Government > Regional Government > North America Government > United States Government (0.36)

Technology: Information Technology > Artificial Intelligence > Applied AI (0.86)

arXiv.org Artificial IntelligenceMar-17-2025

Reliable and Efficient Amortized Model-based Evaluation

Truong, Sang, Tu, Yuheng, Liang, Percy, Li, Bo, Koyejo, Sanmi

Comprehensive evaluations of language models (LM) during both development and deployment phases are necessary because these models possess numerous capabilities (e.g., mathematical reasoning, legal support, or medical diagnostic) as well as safety risks (e.g., racial bias, toxicity, or misinformation). The average score across a wide range of benchmarks provides a signal that helps guide the use of these LMs in practice. Currently, holistic evaluations are costly due to the large volume of benchmark questions, making frequent evaluations impractical. A popular attempt to lower the cost is to compute the average score on a subset of the benchmark. This approach, unfortunately, often renders an unreliable measure of LM performance because the average score is often confounded with the difficulty of the questions in the benchmark subset. Item response theory (IRT) was designed to address this challenge, providing a reliable measurement by careful controlling for question difficulty. Unfortunately, question difficulty is expensive to estimate. Facing this challenge, we train a model that predicts question difficulty from its content, enabling a reliable measurement at a fraction of the cost. In addition, we leverage this difficulty predictor to further improve the evaluation efficiency through training a question generator given a difficulty level. This question generator is essential in adaptive testing, where, instead of using a random subset of the benchmark questions, informative questions are adaptively chosen based on the current estimation of LLM performance. Experiments on 22 common natural language benchmarks and 172 LMs show that this approach is more reliable and efficient compared to current common practice.

large language model, machine learning, natural language, (19 more...)

2503.13335

Country:

North America > United States > Missouri (0.05)
Oceania > Australia (0.04)
Europe > France (0.04)
(13 more...)

Genre: Research Report (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Hao, Jiangang, Fauss, Michael

Test Security in Remote Testing Age: Perspectives from Process Data Analytics and AI

arXiv.org Artificial IntelligenceNov-22-2024

The COVID-19 pandemic has accelerated the implementation and acceptance of remotely proctored high-stake assessments. While the flexible administration of the tests brings forth many values, it raises test security-related concerns. Meanwhile, artificial intelligence (AI) has witnessed tremendous advances in the last five years. Many AI tools (such as the very recent ChatGPT) can generate high-quality responses to test items. These new developments require test security research beyond the statistical analysis of scores and response time. Data analytics and AI methods based on clickstream process data can get us deeper insight into the test-taking process and hold great promise for securing remotely administered high-stakes tests. This chapter uses real-world examples to show that this is indeed the case.

large language model, machine learning, natural language, (21 more...)

2411.13699

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.05)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.34)
Health & Medicine > Therapeutic Area > Immunology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)