AITopics | short answer

Collaborating Authors

short answer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Can you overdose on cough drops? Short answer: Yes.

Popular ScienceMay-5-2026, 13:01:00 GMT

It'd take a lot of them, though. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Menthol soothes in small doses, but too much can irritate your body--and in rare cases, cause serious symptoms. Breakthroughs, discoveries, and DIY tips sent six days a week. We all know the feeling--a throbbing in your throat that won't go away.

artificial intelligence, cough drop, overdose, (11 more...)

Popular Science

Country: North America > United States (0.15)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Consumer Health (1.00)

Technology: Information Technology > Artificial Intelligence (0.35)

Add feedback

AI-Enabled grading with near-domain data for scaling feedback with human-level accuracy

Agarwal, Shyam, Moghimi, Ali, Haudek, Kevin C.

arXiv.org Artificial IntelligenceDec-5-2025

Constructed-response questions are crucial to encourage generative processing and test a learner's understanding of core concepts. However, the limited availability of instructor time, large class sizes, and other resource constraints pose significant challenges in providing timely and detailed evaluation, which is crucial for a holistic educational experience. In addition, providing timely and frequent assessments is challenging since manual grading is labor intensive, and automated grading is complex to generalize to every possible response scenario. This paper proposes a novel and practical approach to grade short-answer constructed-response questions. We discuss why this problem is challenging, define the nature of questions on which our method works, and finally propose a framework that instructors can use to evaluate their students' open-responses, utilizing near-domain data like data from similar questions administered in previous years. The proposed method outperforms the state of the art machine learning models as well as non-fine-tuned large language models like GPT 3.5, GPT 4, and GPT 4o by a considerable margin of over 10-20% in some cases, even after providing the LLMs with reference/model answers. Our framework does not require pre-written grading rubrics and is designed explicitly with practical classroom settings in mind. Our results also reveal exciting insights about learning from near-domain data, including what we term as accuracy and data advantages using human-labeled data, and we believe this is the first work to formalize the problem of automated short answer grading based on the near-domain data.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2512.04113

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Curriculum (1.00)
Education > Educational Setting > Online (0.93)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLM-based Automated Grading with Human-in-the-Loop

Chu, Yucheng, Li, Hang, Yang, Kaiqi, Copur-Gencturk, Yasemin, Tang, Jiliang

arXiv.org Artificial IntelligenceDec-2-2025

The rise of artificial intelligence (AI) technologies, particularly large language models (LLMs), has brought significant advancements to the field of education. Among various applications, automatic short answer grading (ASAG), which focuses on evaluating open-ended textual responses, has seen remarkable progress with the introduction of LLMs. These models not only enhance grading performance compared to traditional ASAG approaches but also move beyond simple comparisons with predefined "golden" answers, enabling more sophisticated grading scenarios, such as rubric-based evaluation. However, existing LLM-powered methods still face challenges in achieving human-level grading performance in rubric-based assessments due to their reliance on fully automated approaches. In this work, we explore the potential of LLMs in ASAG tasks by leveraging their interactive capabilities through a human-in-the-loop (HITL) approach. Our proposed framework, GradeHITL, utilizes the generative properties of LLMs to pose questions to human experts, incorporating their insights to refine grading rubrics dynamically. This adaptive process significantly improves grading accuracy, outperforming existing methods and bringing ASAG closer to human-level evaluation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.05239

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (0.68)

Industry: Education > Educational Technology > Educational Software > Computer-Aided Assessment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Robust Mathematical Reasoning

Luong, Thang, Hwang, Dawsen, Nguyen, Hoang H., Ghiasi, Golnaz, Chervonyi, Yuri, Seo, Insuk, Kim, Junsu, Bingham, Garrett, Lee, Jonathan, Mishra, Swaroop, Zhai, Alex, Hu, Clara Huiyi, Michalewski, Henryk, Kim, Jimin, Ahn, Jeonghyun, Bae, Junhwi, Song, Xingyou, Trinh, Trieu H., Le, Quoc V., Jung, Junehyuk

arXiv.org Artificial IntelligenceNov-4-2025

Finding the right north-star metrics is highly critical for advancing the mathematical reasoning capabilities of foundation models, especially given that existing evaluations are either too easy or only focus on getting correct short answers. To address these issues, we present IMO-Bench, a suite of advanced reasoning benchmarks, vetted by a panel of top specialists and that specifically targets the level of the International Mathematical Olympiad (IMO), the most prestigious venue for young mathematicians. IMO-AnswerBench first tests models on 400 diverse Olympiad problems with verifiable short answers. IMO-Proof Bench is the next-level evaluation for proof-writing capabilities, which includes both basic and advanced IMO level problems as well as detailed grading guidelines to facilitate automatic grading. These benchmarks played a crucial role in our historic achievement of the gold-level performance at IMO 2025 with Gemini Deep Think (Luong and Lockhart, 2025). Our model achieved 80.0% on IMO-AnswerBench and 65.7% on the advanced IMO-Proof Bench, surpassing the best non-Gemini models by large margins of 6.9% and 42.4% respectively. We also showed that autograders built with Gemini reasoning correlate well with human evaluations and construct IMO-GradingBench, with 1000 human gradings on proofs, to enable further progress in automatic evaluation of long-form answers. We hope that IMO-Bench will help the community towards advancing robust mathematical reasoning and release it at https://imobench.github.io/.

benchmark, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2511.01846

Genre: Research Report (0.64)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
(2 more...)

Add feedback

Is microwave cooking nuking all the nutrients?

Popular ScienceNov-3-2025, 14:00:00 GMT

Is microwave cooking nuking all the nutrients? Micorwaves have been a kitchen staple since the late 1960s, but are they safe for our food? Breakthroughs, discoveries, and DIY tips sent every weekday. Originally used for radar and other technologies, the power of microwaves was first harnessed specifically for heating food in 1947 . By the late 1960s, commercial microwave ovens were small and inexpensive enough to become fixtures of the modern kitchen.

cooking method, microwave, nutrient, (16 more...)

Popular Science

Country:

North America > United States > Nebraska > Lancaster County > Lincoln (0.05)
Asia > Middle East > Jordan (0.05)

Genre: Research Report > New Finding (0.36)

Industry:

Health & Medicine > Consumer Health (1.00)
Education > Health & Safety > School Nutrition (0.51)

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback

AGILE: A Novel Reinforcement Learning Framework of LLM Agents

Neural Information Processing SystemsOct-9-2025, 18:01:25 GMT

The agent possesses capabilities beyond conversation, including reflection, tool usage, and expert consultation.

agent, arxiv preprint arxiv, knowledge, (13 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

CaresAI at BioCreative IX Track 1 -- LLM for Biomedical QA

Abdel-Salam, Reem, Adewunmi, Mary, Abayomi, Modinat A.

arXiv.org Artificial IntelligenceSep-3-2025

Large language models (LLMs) are increasingly evident for accurate question answering across various domains. However, rigorous evaluation of their performance on complex question-answering (QA) capabilities is essential before deployment in real-world biomedical and healthcare applications. This paper presents our approach to the MedHopQA track of the BioCreative IX shared task, which focuses on multi-hop biomedical question answering involving diseases, genes, and chemicals. We adopt a supervised fine-tuning strategy leveraging LLaMA 3 8B, enhanced with a curated biomedical question-answer dataset compiled from external sources including BioASQ, MedQuAD, and TREC. Three experimental setups are explored: fine-tuning on combined short and long answers, short answers only, and long answers only. While our models demonstrate strong domain understanding, achieving concept-level accuracy scores of up to 0.8, their Exact Match (EM) scores remain significantly lower, particularly in the test phase. We introduce a two-stage inference pipeline for precise short-answer extraction to mitigate verbosity and improve alignment with evaluation metrics. Despite partial improvements, challenges persist in generating strictly formatted outputs. Our findings highlight the gap between semantic understanding and exact answer evaluation in biomedical LLM applications, motivating further research in output control and post-processing strategies.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.00806

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

An Empirical Investigation of Gender Stereotype Representation in Large Language Models: The Italian Case

Giachino, Gioele, Rondina, Marco, Vetrò, Antonio, Coppola, Riccardo, De Martin, Juan Carlos

arXiv.org Artificial IntelligenceJul-28-2025

The increasing use of Large Language Models (LLMs) in a large variety of domains has sparked worries about how easily they can perpetuate stereotypes and contribute to the generation of biased content. With a focus on gender and professional bias, this work examines in which manner LLMs shape responses to ungendered prompts, contributing to biased outputs. This analysis uses a structured experimental method, giving different prompts involving three different professional job combinations, which are also characterized by a hierarchical relationship. This study uses Italian, a language with extensive grammatical gender differences, to highlight potential limitations in current LLMs' ability to generate objective text in non-English languages. Two popular LLM-based chatbots are examined, namely OpenAI ChatGPT (gpt-4o-mini) and Google Gemini (gemini-1.5-flash). Through APIs, we collected a range of 3600 responses. The results highlight how content generated by LLMs can perpetuate stereotypes. For example, Gemini associated 100% (ChatGPT 97%) of 'she' pronouns to the 'assistant' rather than the 'manager'. The presence of bias in AI-generated text can have significant implications in many fields, such as in the workplaces or in job selections, raising ethical concerns about its use. Understanding these risks is pivotal to developing mitigation strategies and assuring that AI-based systems do not increase social inequalities, but rather contribute to more equitable outcomes. Future research directions include expanding the study to additional chatbots or languages, refining prompt engineering methods or further exploiting a larger experimental base.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.19156

Country:

Europe (0.93)
North America > United States (0.46)
Asia > Middle East > UAE (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

Leveraging AI Graders for Missing Score Imputation to Achieve Accurate Ability Estimation in Constructed-Response Tests

Uto, Masaki, Ito, Yuma

arXiv.org Artificial IntelligenceJun-26-2025

Evaluating the abilities of learners is a fundamental objective in the field of education. In particular, there is an increasing need to assess higher-order abilities such as expressive skills and logical thinking. Constructed-response tests such as short-answer and essay-based questions have become widely used as a method to meet this demand. Although these tests are effective, they require substantial manual grading, making them both labor-intensive and costly. Item response theory (IRT) provides a promising solution by enabling the estimation of ability from incomplete score data, where human raters grade only a subset of answers provided by learners across multiple test items. However, the accuracy of ability estimation declines as the proportion of missing scores increases. Although data augmentation techniques for imputing missing scores have been explored in order to address this limitation, they often struggle with inaccuracy for sparse or heterogeneous data. To overcome these challenges, this study proposes a novel method for imputing missing scores by leveraging automated scoring technologies for accurate IRT-based ability estimation. The proposed method achieves high accuracy in ability estimation while markedly reducing manual grading workload.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.20119

Country: Asia > Japan > Honshū (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.54)

Industry:

Education > Assessment & Standards (0.97)
Education > Educational Technology > Educational Software > Computer Based Training (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation

Chu, Yucheng, He, Peng, Li, Hang, Han, Haoyu, Yang, Kaiqi, Xue, Yu, Li, Tingting, Krajcik, Joseph, Tang, Jiliang

arXiv.org Artificial IntelligenceJun-5-2025

Short answer assessment is a vital component of science education, allowing evaluation of students' complex three-dimensional understanding. Large language models (LLMs) that possess human-like ability in linguistic tasks are increasingly popular in assisting human graders to reduce their workload. However, LLMs' limitations in domain knowledge restrict their understanding in task-specific requirements and hinder their ability to achieve satisfactory performance. Retrieval-augmented generation (RAG) emerges as a promising solution by enabling LLMs to access relevant domain-specific knowledge during assessment. In this work, we propose an adaptive RAG framework for automated grading that dynamically retrieves and incorporates domain-specific knowledge based on the question and student answer context. Our approach combines semantic search and curated educational sources to retrieve valuable reference materials. Experimental results in a science education dataset demonstrate that our system achieves an improvement in grading accuracy compared to baseline LLM approaches. The findings suggest that RAG-enhanced grading systems can serve as reliable support with efficient performance gains.

dimension, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2504.05276

Country: North America > United States (0.69)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Education > Assessment & Standards > Student Performance (0.89)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback