AITopics

Deep learning (DL) has shown significant promise in medical imaging applications, especially in the detection and diagnosis of conditions such as breast cancer through mammogram analysis [1]. Despite their increasing adoption, deep neural networks (DNNs) remain vulnerable to adversarial attacks and natural variations. A meta-analysis of 516 published research works showed that less than 6% of AI applications for the diagnostic analysis of medical images incorporated external validation, such as distribution shift tests [15]. Furthermore, according to the FUTURE AI guidelines [16], improving the generalizability and robustness of computer-aided diagnostic systems is essential, particularly in healthcare applications where model safety and reliability are paramount. These vulnerabilities are particularly concerning in critical applications such as healthcare, where errors can have profound consequences for patient outcomes. Adversarial attacks refer to carefully crafted perturbations designed to manipulate the predictions of the model.

artificial intelligence, machine learning, robustness, (18 more...)

2506.17133

Country: North America > United States > Massachusetts (0.28)

Genre:

Research Report > New Finding (0.47)
Instructional Material > Online (0.41)
Instructional Material > Course Syllabus & Notes (0.41)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.35)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Thomas, Danielle R., Borchers, Conrad, Bhushan, Shambhavi, Gatz, Erin, Gupta, Shivang, Koedinger, Kenneth R.

LLM-Generated Feedback Supports Learning If Learners Choose to Use It

Large language models (LLMs) are increasingly used to generate feedback, yet their impact on learning remains underexplored, especially compared to existing feedback methods. This study investigates how on-demand LLM-generated explanatory feedback influences learning in seven scenario-based tutor training lessons. Analyzing over 2,600 lesson completions from 885 tutor learners, we compare posttest performance among learners across three groups: learners who received feedback generated by gpt-3.5-turbo, those who declined it, and those without access. All groups received non-LLM corrective feedback. To address potential selection bias-where higher-performing learners may be more inclined to use LLM feedback-we applied propensity scoring. Learners with a higher predicted likelihood of engaging with LLM feedback scored significantly higher at posttest than those with lower propensity. After adjusting for this effect, two out of seven lessons showed statistically significant learning benefits from LLM feedback with standardized effect sizes of 0.28 and 0.33. These moderate effects suggest that the effectiveness of LLM feedback depends on the learners' tendency to seek support. Importantly, LLM feedback did not significantly increase completion time, and learners overwhelmingly rated it as helpful. These findings highlight LLM feedback's potential as a low-cost and scalable way to improve learning on open-ended tasks, particularly in existing systems already providing feedback without LLMs. This work contributes open datasets, LLM prompts, and rubrics to support reproducibility.

large language model, machine learning, natural language, (17 more...)

2506.17006

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Berthon, Antonin, van der Schaar, Mihaela

Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond

Accurately assessing student knowledge is critical for effective education, yet traditional Knowledge Tracing (KT) methods rely on opaque latent embeddings, limiting interpretability. Even LLM-based approaches generate direct predictions or summaries that may hallucinate without any accuracy guarantees. We recast KT as an inverse problem: learning the minimum natural-language summary that makes past answers explainable and future answers predictable. Our Language Bottleneck Model (LBM) consists of an encoder LLM that writes an interpretable knowledge summary and a frozen decoder LLM that must reconstruct and predict student responses using only that summary text. By constraining all predictive information to pass through a short natural-language bottleneck, LBMs ensure that the summary contains accurate information while remaining human-interpretable. Experiments on synthetic arithmetic benchmarks and the large-scale Eedi dataset show that LBMs rival the accuracy of state-of-the-art KT and direct LLM methods while requiring orders-of-magnitude fewer student trajectories. We demonstrate that training the encoder with group-relative policy optimization, using downstream decoding accuracy as a reward signal, effectively improves summary quality.

large language model, machine learning, natural language, (18 more...)

2506.16982

Country: Europe (0.28)

Genre:

Overview (1.00)
Research Report (0.82)
Instructional Material (0.68)

Industry:

Education > Educational Setting (0.67)
Education > Educational Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Essential-Web v1.0: 24T tokens of organized web data

AI, Essential, :, null, Hojel, Andrew, Pust, Michael, Romanski, Tim, Vanjani, Yash, Kapila, Ritvik, Parmar, Mohit, Chaluvaraju, Adarsh, Tripathy, Alok, Thomas, Anil, Tanwer, Ashish, Shah, Darsh J, Shah, Ishaan, Stratos, Karl, Nguyen, Khoi, Smith, Kurt, Callahan, Michael, Rushton, Peter, Monk, Philip, Mazarakis, Platon, Jamal, Saad, Srivastava, Saurabh, Singla, Somanshu, Vaswani, Ashish

Data plays the most prominent role in how language models acquire skills and knowledge. The lack of massive, well-organized pre-training datasets results in costly and inaccessible data pipelines. We present Essential-Web v1.0, a 24-trillion-token dataset in which every document is annotated with a twelve-category taxonomy covering topic, format, content complexity, and quality. Taxonomy labels are produced by EAI-Distill-0.5b, a fine-tuned 0.5b-parameter model that achieves an annotator agreement within 3% of Qwen2.5-32B-Instruct. With nothing more than SQL-style filters, we obtain competitive web-curated datasets in math (-8.0% relative to SOTA), web code (+14.3%), STEM (+24.5%) and medical (+8.6%). Essential-Web v1.0 is available on HuggingFace: https://huggingface.co/datasets/EssentialAI/essential-web-v1.0

large language model, machine learning, natural language, (21 more...)

2506.14111

Country:

North America > United States (0.67)
Asia (0.46)

Genre:

Research Report > New Finding (0.92)
Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Information Technology (0.67)
Education > Curriculum > Subject-Specific Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
(2 more...)

arXiv.org Artificial IntelligenceJun-19-2025

Single-Agent vs. Multi-Agent LLM Strategies for Automated Student Reflection Assessment

Li, Gen, Chen, Li, Tang, Cheng, Švábenský, Valdemar, Deguchi, Daisuke, Yamashita, Takayoshi, Shimada, Atsushi

We explore the use of Large Language Models (LLMs) for automated assessment of open-text student reflections and prediction of academic performance. Traditional methods for evaluating reflections are time-consuming and may not scale effectively in educational settings. In this work, we employ LLMs to transform student reflections into quantitative scores using two assessment strategies (single-agent and multi-agent) and two prompting techniques (zero-shot and few-shot). Our experiments, conducted on a dataset of 5,278 reflections from 377 students over three academic terms, demonstrate that the single-agent with few-shot strategy achieves the highest match rate with human evaluations. Furthermore, models utilizing LLM-assessed reflection scores outperform baselines in both at-risk student identification and grade prediction tasks. These findings suggest that LLMs can effectively automate reflection assessment, reduce educators' workload, and enable timely support for students who may need additional assistance. Our work emphasizes the potential of integrating advanced generative AI technologies into educational practices to enhance student engagement and academic success.

large language model, machine learning, natural language, (17 more...)

doi: 10.1007/978-981-96-8186-0_24

2504.05716

Country:

Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.15)
Asia > Japan > Honshū > Chūbu (0.14)

Genre:

Instructional Material (1.00)
Research Report > New Finding (0.88)

Industry:

Education > Educational Setting (1.00)
Education > Educational Technology > Educational Software (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.54)

arXiv.org Artificial IntelligenceJun-19-2025

Offensive Robot Cybersecurity

Mayoral-Vilches, Víctor

Offensive Robot Cybersecurity introduces a groundbreaking approach by advocating for offensive security methods empowered by means of automation. It emphasizes the necessity of understanding attackers' tactics and identifying vulnerabilities in advance to develop effective defenses, thereby improving robots' security posture. This thesis leverages a decade of robotics experience, employing Machine Learning and Game Theory to streamline the vulnerability identification and exploitation process. Intrinsically, the thesis uncovers a profound connection between robotic architecture and cybersecurity, highlighting that the design and creation aspect of robotics deeply intertwines with its protection against attacks. This duality -- whereby the architecture that shapes robot behavior and capabilities also necessitates a defense mechanism through offensive and defensive cybersecurity strategies -- creates a unique equilibrium. Approaching cybersecurity with a dual perspective of defense and attack, rooted in an understanding of systems architecture, has been pivotal. Through comprehensive analysis, including ethical considerations, the development of security tools, and executing cyber attacks on robot software, hardware, and industry deployments, this thesis proposes a novel architecture for cybersecurity cognitive engines. These engines, powered by advanced game theory and machine learning, pave the way for autonomous offensive cybersecurity strategies for robots, marking a significant shift towards self-defending robotic systems. This research not only underscores the importance of offensive measures in enhancing robot cybersecurity but also sets the stage for future advancements where robots are not just resilient to cyber threats but are equipped to autonomously safeguard themselves.

large language model, machine learning, reinforcement learning, (24 more...)

2506.15343

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.67)

Genre:

Workflow (1.00)
Summary/Review (1.00)
Research Report > Promising Solution (1.00)
(4 more...)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(7 more...)

Biswas, Dipayan, Shah, Shishir, Subhlok, Jaspal

Lecture Video Visual Objects (LVVO) Dataset: A Benchmark for Visual Object Detection in Educational Videos

arXiv.org Artificial IntelligenceJun-18-2025

We introduce the Lecture Video Visual Objects (LVVO) dataset, a new benchmark for visual object detection in educational video content. The dataset consists of 4,000 frames extracted from 245 lecture videos spanning biology, computer science, and geosciences. A subset of 1,000 frames, referred to as LVVO_1k, has been manually annotated with bounding boxes for four visual categories: Table, Chart-Graph, Photographic-image, and Visual-illustration. Each frame was labeled independently by two annotators, resulting in an inter-annotator F1 score of 83.41%, indicating strong agreement. To ensure high-quality consensus annotations, a third expert reviewed and resolved all cases of disagreement through a conflict resolution process. To expand the dataset, a semi-supervised approach was employed to automatically annotate the remaining 3,000 frames, forming LVVO_3k. The complete dataset offers a valuable resource for developing and evaluating both supervised and semi-supervised methods for visual content detection in educational videos. The LVVO dataset is publicly available to support further research in this domain.

annotation, artificial intelligence, dataset, (13 more...)

2506.13657

Genre:

Instructional Material (0.47)
Research Report (0.40)

Industry: Education > Educational Technology > Audio & Video (0.82)

Technology: Information Technology > Artificial Intelligence > Vision (0.62)

arXiv.org Artificial IntelligenceJun-18-2025

AviationLLM: An LLM-based Knowledge System for Aviation Training

Wan, Jia'ang, Shen, Feng, Li, Fujuan, Sun, Yanjin, Li, Yan, Zhang, Shiwen

Aviation training is a core link in ensuring flight safety, improving industry efficiency and promoting sustainable development. It not only involves flight simulation but also requires the learning of a great deal of professional aviation theory knowledge. In the existing training system, the knowledge is mainly imparted by the the instructors. However, the number of instructors is limited and the professional answers obtained from the Internet are not accurate enough, resulting in low training efficiency. To address this, we introduced LLM, but the basic pre-trained model cannot provide accurate answers to professional fields, so we fine-tuned it. Traditional Supervised Fine-Tuning (SFT) risk generating superficially plausible but factually incorrect responses due to insufficient data coverage. To address this, we employ Direct Preference Optimization(DPO). This paper proposes Retrieval-Augmented LLM Alignment via Direct Preference Optimization(RALA-DPO). We select open source pre-trained LLM Qwen and adapt it to aviation theory training through DPO-based domain alignment. Simultaneously, to mitigate hallucinations caused by training data biases, knowledge obsolescence, or domain knowledge gaps, we implement Retrieval-Augmented Generation(RAG) technology that combines generative and retrieval models. RALA-DPO effectively retrieves relevant information from external knowledge bases and delivers precise and high-quality responses through the generative model. Experimental results demonstrate that RALA-DPO can improve accuracy in response to professional aviation knowledge. With integrated RAG mechanisms, this system can further improve the accuracy of answers and achieve zero-cost knowledge updates simultaneously.

large language model, machine learning, natural language, (18 more...)

2506.14336

Country: Asia > China (0.14)

Genre:

Research Report (0.70)
Instructional Material (0.68)

Industry:

Transportation > Air (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Machine LearningJun-17-2025

Statistical Machine Learning for Astronomy -- A Textbook

Ting, Yuan-Sen

This textbook provides a systematic treatment of statistical machine learning for astronomical research through the lens of Bayesian inference, developing a unified framework that reveals connections between modern data analysis techniques and traditional statistical methods. We show how these techniques emerge from familiar statistical foundations. The consistently Bayesian perspective prioritizes uncertainty quantification and statistical rigor essential for scientific inference in astronomy. The textbook progresses from probability theory and Bayesian inference through supervised learning including linear regression with measurement uncertainties, logistic regression, and classification. Unsupervised learning topics cover Principal Component Analysis and clustering methods. We then introduce computational techniques through sampling and Markov Chain Monte Carlo, followed by Gaussian Processes as probabilistic nonparametric methods and neural networks within the broader statistical context. Our theory-focused pedagogical approach derives each method from first principles with complete mathematical development, emphasizing statistical insight and complementing with astronomical applications. We prioritize understanding why algorithms work, when they are appropriate, and how they connect to broader statistical principles. The treatment builds toward modern techniques including neural networks through a solid foundation in classical methods and their theoretical underpinnings. This foundation enables thoughtful application of these methods to astronomical research, ensuring proper consideration of assumptions, limitations, and uncertainty propagation essential for advancing astronomical knowledge in the era of large astronomical surveys.

bayesian inference, book review, machine learning, (20 more...)

arXiv.org Machine Learning

2506.1223

Genre:

Summary/Review (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(3 more...)

arXiv.org Artificial IntelligenceJun-17-2025

Exploring the Potential of Metacognitive Support Agents for Human-AI Co-Creation

Gmeiner, Frederic, Luo, Kaitao, Wang, Ye, Holstein, Kenneth, Martelaro, Nikolas

Despite the potential of generative AI (GenAI) design tools to enhance design processes, professionals often struggle to integrate AI into their workflows. Fundamental cognitive challenges include the need to specify all design criteria as distinct parameters upfront (intent formulation) and designers' reduced cognitive involvement in the design process due to cognitive offloading, which can lead to insufficient problem exploration, underspecification, and limited ability to evaluate outcomes. Motivated by these challenges, we envision novel metacognitive support agents that assist designers in working more reflectively with GenAI. To explore this vision, we conducted exploratory prototyping through a Wizard of Oz elicitation study with 20 mechanical designers probing multiple metacognitive support strategies. We found that agent-supported users created more feasible designs than non-supported users, with differing impacts between support strategies. Based on these findings, we discuss opportunities and tradeoffs of metacognitive support agents and considerations for future AI-based design tools.

artificial intelligence, machine learning, natural language, (19 more...)

doi: 10.1145/3715336.3735785

2506.12879

Country:

North America > United States (1.00)
Europe (1.00)
North America > Canada > Quebec (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
(2 more...)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)