trainee
NeuroABench: A Multimodal Evaluation Benchmark for Neurosurgical Anatomy Identification
Song, Ziyang, Zang, Zelin, Ye, Xiaofan, Xu, Boqiang, Bai, Long, Wu, Jinlin, Ren, Hongliang, Liu, Hongbin, Luo, Jiebo, Lei, Zhen
Multimodal Large Language Models (MLLMs) have shown significant potential in surgical video understanding. With improved zero-shot performance and more effective human-machine interaction, they provide a strong foundation for advancing surgical education and assistance. However, existing research and datasets primarily focus on understanding surgical procedures and workflows, while paying limited attention to the critical role of anatomical comprehension. In clinical practice, surgeons rely heavily on precise anatomical understanding to interpret, review, and learn from surgical videos. To fill this gap, we introduce the Neurosurgical Anatomy Benchmark (NeuroABench), the first multimodal benchmark explicitly created to evaluate anatomical understanding in the neurosurgical domain. NeuroABench consists of 9 hours of annotated neurosurgical videos covering 89 distinct procedures and is developed using a novel multimodal annotation pipeline with multiple review cycles. The benchmark evaluates the identification of 68 clinical anatomical structures, providing a rigorous and standardized framework for assessing model performance. Experiments on over 10 state-of-the-art MLLMs reveal significant limitations, with the best-performing model achieving only 40.87% accuracy in anatomical identification tasks. To further evaluate the benchmark, we extract a subset of the dataset and conduct an informative test with four neurosurgical trainees. The results show that the best-performing student achieves 56% accuracy, with the lowest scores of 28% and an average score of 46.5%. While the best MLLM performs comparably to the lowest-scoring student, it still lags significantly behind the group's average performance. This comparison underscores both the progress of MLLMs in anatomical understanding and the substantial gap that remains in achieving human-level performance.
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Surgery (1.00)
How KPop Demon Hunters Star EJAE Topped the Charts
Kids everywhere know her voice--if not her name. WIRED talks to the former SM trainee about her rise to global superstardom with her hit song "Golden." EJAE, the voice and the writing talent behind "Golden," has gone platinum. The night before our interview, the 33-year-old singer-songwriter found out that record sales from the soundtrack had surged past a million units. Jimmy Fallon, of all people, delivered the news alongside a glimmering framed record when she was appearing on with Audrey Nuna and Rei Ami for the first full live performance of "Golden." Together the trio make up the singing voices of girl group -Huntr/x in Netflix's animated musical turned bona fide phenomenon. If you have a kid, you probably don't need a refresher, but the movie follows Huntr/x's Rumi, Mira, and Zoey as they juggle being astronomically famous while moonlighting as demon hunters. That Fallon appearance, and the appearance that predated it, might have been the first times that American audiences actually saw (and heard) the human being behind that inescapable song.
- North America > United States > New York (0.04)
- North America > United States > California (0.04)
- Europe > Slovakia (0.04)
- Europe > Czechia (0.04)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
A Case for Leveraging Generative AI to Expand and Enhance Training in the Provision of Mental Health Services
Lawrence, Hannah R., Stirman, Shannon Wiltsey, Dorison, Samuel, Yun, Taedong, Bell, Megan Jones
Generative artificial intelligence (Generative AI) is transforming healthcare. With this evolution comes optimism regarding the impact it will have on mental health, as well as concern regarding the risks that come with generative AI operating in the mental health domain. Much of the investment in, and academic and public discourse about, AI-powered solutions for mental health has focused on therapist chatbots. Despite the common assumption that chatbots will be the most impactful application of GenAI to mental health, we make the case here for a lower-risk, high impact use case: leveraging generative AI to enhance and scale training in mental health service provision. We highlight key benefits of using generative AI to help train people to provide mental health services and present a real-world case study in which generative AI improved the training of veterans to support one another's mental health. With numerous potential applications of generative AI in mental health, we illustrate why we should invest in using generative AI to support training people in mental health service provision.
- North America > United States > California > Santa Clara County > Mountain View (0.05)
- North America > United States > California > Santa Clara County > Stanford (0.04)
Radiology's Last Exam (RadLE): Benchmarking Frontier Multimodal AI Against Human Experts and a Taxonomy of Visual Reasoning Errors in Radiology
Datta, Suvrankar, Buchireddygari, Divya, Kaza, Lakshmi Vennela Chowdary, Bhalke, Mrudula, Singh, Kautik, Pandey, Ayush, Vasipalli, Sonit Sai, Karnwal, Upasana, Bhatti, Hakikat Bir Singh, Maroo, Bhavya Ratan, Hebbar, Sanjana, Joseph, Rahul, Kaur, Gurkawal, Singh, Devyani, V, Akhil, Prasad, Dheeksha Devasya Shama, Mahajan, Nishtha, Arisha, Ayinaparthi, Vanagundi, Rajesh, Nandy, Reet, Vuthoo, Kartik, Rajvanshi, Snigdhaa, Kondaveeti, Nikhileswar, Gunjal, Suyash, Jain, Rishabh, Jain, Rajat, Agrawal, Anurag
Generalist multimodal AI systems such as large language models (LLMs) and vision language models (VLMs) are increasingly accessed by clinicians and patients alike for medical image interpretation through widely available consumer-facing chatbots. Most evaluations claiming expert level performance are on public datasets containing common pathologies. Rigorous evaluation of frontier models on difficult diagnostic cases remains limited. We developed a pilot benchmark of 50 expert-level "spot diagnosis" cases across multiple imaging modalities to evaluate the performance of frontier AI models against board-certified radiologists and radiology trainees. To mirror real-world usage, the reasoning modes of five popular frontier AI models were tested through their native web interfaces, viz. OpenAI o3, OpenAI GPT-5, Gemini 2.5 Pro, Grok-4, and Claude Opus 4.1. Accuracy was scored by blinded experts, and reproducibility was assessed across three independent runs. GPT-5 was additionally evaluated across various reasoning modes. Reasoning quality errors were assessed and a taxonomy of visual reasoning errors was defined. Board-certified radiologists achieved the highest diagnostic accuracy (83%), outperforming trainees (45%) and all AI models (best performance shown by GPT-5: 30%). Reliability was substantial for GPT-5 and o3, moderate for Gemini 2.5 Pro and Grok-4, and poor for Claude Opus 4.1. These findings demonstrate that advanced frontier models fall far short of radiologists in challenging diagnostic cases. Our benchmark highlights the present limitations of generalist AI in medical imaging and cautions against unsupervised clinical use. We also provide a qualitative analysis of reasoning traces and propose a practical taxonomy of visual reasoning errors by AI models for better understanding their failure modes, informing evaluation standards and guiding more robust model development.
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)
Training the next generation of physicians for artificial intelligence-assisted clinical neuroradiology: ASNR MICCAI Brain Tumor Segmentation (BraTS) 2025 Lighthouse Challenge education platform
Amiruddin, Raisa, Yordanov, Nikolay Y., Maleki, Nazanin, Fehringer, Pascal, Gkampenis, Athanasios, Janas, Anastasia, Krantchev, Kiril, Moawad, Ahmed, Umeh, Fabian, Abosabie, Salma, Abosabie, Sara, Alotaibi, Albara, Ghonim, Mohamed, Ghonim, Mohanad, Mhana, Sedra Abou Ali, Page, Nathan, Jakovljevic, Marko, Sharifi, Yasaman, Bhatia, Prisha, Manteghinejad, Amirreza, Guelen, Melisa, Veronesi, Michael, Hill, Virginia, So, Tiffany, Krycia, Mark, Petrovic, Bojan, Memon, Fatima, Cramer, Justin, Schrickel, Elizabeth, Kosovic, Vilma, Vidal, Lorenna, Thompson, Gerard, Ikuta, Ichiro, Albalooshy, Basimah, Nabavizadeh, Ali, Tahon, Nourel Hoda, Shekdar, Karuna, Bhatia, Aashim, Kirsch, Claudia, D'Anna, Gennaro, Lohmann, Philipp, Nour, Amal Saleh, Myronenko, Andriy, Goldman-Yassen, Adam, Reid, Janet R., Aneja, Sanjay, Bakas, Spyridon, Aboian, Mariam
High-quality reference standard image data creation by neuroradiology experts for automated clinical tools can be a powerful tool for neuroradiology & artificial intelligence education. We developed a multimodal educational approach for students and trainees during the MICCAI Brain Tumor Segmentation Lighthouse Challenge 2025, a landmark initiative to develop accurate brain tumor segmentation algorithms. Fifty-six medical students & radiology trainees volunteered to annotate brain tumor MR images for the BraTS challenges of 2023 & 2024, guided by faculty-led didactics on neuropathology MRI. Among the 56 annotators, 14 select volunteers were then paired with neuroradiology faculty for guided one-on-one annotation sessions for BraTS 2025. Lectures on neuroanatomy, pathology & AI, journal clubs & data scientist-led workshops were organized online. Annotators & audience members completed surveys on their perceived knowledge before & after annotations & lectures respectively. Fourteen coordinators, each paired with a neuroradiologist, completed the data annotation process, averaging 1322.9+/-760.7 hours per dataset per pair and 1200 segmentations in total. On a scale of 1-10, annotation coordinators reported significant increase in familiarity with image segmentation software pre- and post-annotation, moving from initial average of 6+/-2.9 to final average of 8.9+/-1.1, and significant increase in familiarity with brain tumor features pre- and post-annotation, moving from initial average of 6.2+/-2.4 to final average of 8.1+/-1.2. We demonstrate an innovative offering for providing neuroradiology & AI education through an image segmentation challenge to enhance understanding of algorithm development, reinforce the concept of data reference standard, and diversify opportunities for AI-driven image analysis among future physicians.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- North America > United States > South Carolina > Charleston County > Charleston (0.14)
- North America > United States > Missouri > Boone County > Columbia (0.14)
- (29 more...)
- Instructional Material > Course Syllabus & Notes (1.00)
- Research Report (0.83)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
RadGame: An AI-Powered Platform for Radiology Education
Baharoon, Mohammed, Raissi, Siavash, Jun, John S., Heintz, Thibault, Alabbad, Mahmoud, Alburkani, Ali, Kim, Sung Eun, Kleinschmidt, Kent, Alhumaydhi, Abdulrahman O., Alghamdi, Mohannad Mohammed G., Palacio, Jeremy Francis, Bukhaytan, Mohammed, Prudlo, Noah Michael, Akula, Rithvik, Chrisler, Brady, Galligos, Benjamin, Almutairi, Mohammed O., Alanazi, Mazeen Mohammed, Alrashdi, Nasser M., Hwang, Joel Jihwan, Jaliparthi, Sri Sai Dinesh, Nelson, Luke David, Nguyen, Nathaniel, Suryadevara, Sathvik, Kim, Steven, Mohammed, Mohammed F., Semenov, Yevgeniy R., Yu, Kun-Hsing, Aljouie, Abdulrhman, AlOmaish, Hassan, Rodman, Adam, Rajpurkar, Pranav
We introduce RadGame, an AI-powered gam-ified platform for radiology education that targets two core skills: localizing findings and generating reports. Traditional radiology training is based on passive exposure to cases or active practice with real-time input from supervising radiologists, limiting opportunities for immediate and scalable feedback. RadGame addresses this gap by combining gamification with large-scale public datasets and automated, AI-driven feedback that provides clear, structured guidance to human learners. In RadGame Localize, players draw bounding boxes around abnormalities, which are automatically compared to radiologist-drawn annotations from public datasets, and visual explanations are generated by vision-language models for user missed findings. In RadGame Report, players compose findings given a chest X-ray, patient age and indication, and receive structured AI feedback based on radiology report generation metrics, highlighting errors and omissions compared to a radiologist's written ground truth report from public datasets, producing a final performance and style score. In a prospective evaluation, participants using RadGame achieved a 68% improvement in localization accuracy compared to 17% with traditional passive methods and a 31% improvement in report-writing accuracy compared to 4% with traditional methods after seeing the same cases. RadGame highlights the potential of AI-driven gamification to deliver scalable, feedback-rich radiology training and reimagines the application of medical AI resources in education.
- North America > United States > Massachusetts > Suffolk County > Boston (0.05)
- Asia > Middle East > Saudi Arabia > Riyadh Province > Riyadh (0.05)
- Europe > Netherlands > Limburg > Maastricht (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Instructional Material (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.88)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Applied AI (0.68)
A Human-Centered Approach to Identifying Promises, Risks, & Challenges of Text-to-Image Generative AI in Radiology
Morrison, Katelyn, Mathur, Arpit, Bradshaw, Aidan, Wartmann, Tom, Lundi, Steven, Zandifar, Afrooz, Dai, Weichang, Batmanghelich, Kayhan, Eslami, Motahhare, Perer, Adam
As text-to-image generative models rapidly improve, AI researchers are making significant advances in developing domain-specific models capable of generating complex medical imagery from text prompts. Despite this, these technical advancements have overlooked whether and how medical professionals would benefit from and use text-to-image generative AI (GenAI) in practice. By developing domain-specific GenAI without involving stakeholders, we risk the potential of building models that are either not useful or even more harmful than helpful. In this paper, we adopt a human-centered approach to responsible model development by involving stakeholders in evaluating and reflecting on the promises, risks, and challenges of a novel text-to-CT Scan GenAI model. Through exploratory model prompting activities, we uncover the perspectives of medical students, radiology trainees, and radiologists on the role that text-to-CT Scan GenAI can play across medical education, training, and practice. This human-centered approach additionally enabled us to surface technical challenges and domain-specific risks of generating synthetic medical images. We conclude by reflecting on the implications of medical text-to-image GenAI.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Oceania > New Zealand (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (2 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.93)
- Instructional Material > Course Syllabus & Notes (0.93)
- Personal > Interview (0.68)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Education > Curriculum > Subject-Specific Education > Professional (0.34)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Hand by Hand: LLM Driving EMS Assistant for Operational Skill Learning
Xiang, Wei, Lei, Ziyue, Che, Haoyuan, Ye, Fangyuan, Wu, Xueting, Sun, Lingyun
Operational skill learning, inherently physical and reliant on hands-on practice and kinesthetic feedback, has yet to be effectively replicated in large language model (LLM)-supported training. Current LLM training assistants primarily generate customized textual feedback, neglecting the crucial kinesthetic modality. This gap derives from the textual and uncertain nature of LLMs, compounded by concerns on user acceptance of LLM driven body control. To bridge this gap and realize the potential of collaborative human-LLM action, this work explores human experience of LLM driven kinesthetic assistance. Specifically, we introduced an "Align-Analyze-Adjust" strategy and developed FlightAxis, a tool that integrates LLM with Electrical Muscle Stimulation (EMS) for flight skill acquisition, a representative operational skill domain. FlightAxis learns flight skills from manuals and guides forearm movements during simulated flight tasks. Our results demonstrate high user acceptance of LLM-mediated body control and significantly reduced task completion times. Crucially, trainees reported that this kinesthetic assistance enhanced their awareness of operation flaws and fostered increased engagement in the training process, rather than relieving perceived load. This work demonstrated the potential of kinesthetic LLM training in operational skill acquisition.
- North America > United States > District of Columbia > Washington (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Transportation > Air (1.00)
- Government (0.94)
- Health & Medicine > Therapeutic Area (0.68)
- Education > Curriculum > Subject-Specific Education (0.68)
Explainable AI for Automated User-specific Feedback in Surgical Skill Acquisition
Gomez, Catalina, Seenivasan, Lalithkumar, Zou, Xinrui, Yoon, Jeewoo, Chu, Sirui, Leong, Ariel, Kramer, Patrick, Ku, Yu-Chun, Porras, Jose L., Martin-Gomez, Alejandro, Ishii, Masaru, Unberath, Mathias
Traditional surgical skill acquisition relies heavily on expert feedback, yet direct access is limited by faculty availability and variability in subjective assessments. While trainees can practice independently, the lack of personalized, objective, and quantitative feedback reduces the effectiveness of self-directed learning. Recent advances in computer vision and machine learning have enabled automated surgical skill assessment, demonstrating the feasibility of automatic competency evaluation. However, it is unclear whether such Artificial Intelligence (AI)-driven feedback can contribute to skill acquisition. Here, we examine the effectiveness of explainable AI (XAI)-generated feedback in surgical training through a human-AI study. We create a simulation-based training framework that utilizes XAI to analyze videos and extract surgical skill proxies related to primitive actions. Our intervention provides automated, user-specific feedback by comparing trainee performance to expert benchmarks and highlighting deviations from optimal execution through understandable proxies for actionable guidance. In a prospective user study with medical students, we compare the impact of XAI-guided feedback against traditional video-based coaching on task outcomes, cognitive load, and trainees' perceptions of AI-assisted learning. Results showed improved cognitive load and confidence post-intervention. While no differences emerged between the two feedback types in reducing performance gaps or practice adjustments, trends in the XAI group revealed desirable effects where participants more closely mimicked expert practice. This work encourages the study of explainable AI in surgical education and the development of data-driven, adaptive feedback mechanisms that could transform learning experiences and competency assessment.
- North America > United States > Arkansas > Washington County > Fayetteville (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Surgery (0.67)
- Education > Curriculum > Subject-Specific Education (0.55)
- Education > Educational Technology > Educational Software > Computer Based Training (0.34)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Vision (0.89)
- Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.81)
Probing Experts' Perspectives on AI-Assisted Public Speaking Training
Fourati, Nesrine, Barkar, Alisa, Dragée, Marion, Danthon-Lefebvre, Liv, Chollet, Mathieu
Background: Public speaking is a vital professional skill, yet it remains a source of significant anxiety for many individuals. Traditional training relies heavily on expert coaching, but recent advances in AI has led to novel types of commercial automated public speaking feedback tools. However, most research has focused on prototypes rather than commercial applications, and little is known about how public speaking experts perceive these tools. Objectives: This study aims to evaluate expert opinions on the efficacy and design of commercial AI-based public speaking training tools and to propose guidelines for their improvement. Methods: The research involved 16 semi-structured interviews and 2 focus groups with public speaking experts. Participants discussed their views on current commercial tools, their potential integration into traditional coaching, and suggestions for enhancing these systems. Results and Conclusions: Experts acknowledged the value of AI tools in handling repetitive, technical aspects of training, allowing coaches to focus on higher-level skills. However they found key issues in current tools, emphasising the need for personalised, understandable, carefully selected feedback and clear instructional design. Overall, they supported a hybrid model combining traditional coaching with AI-supported exercises.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- Europe > United Kingdom > Scotland > City of Glasgow > Glasgow (0.04)
- (3 more...)
- Research Report (1.00)
- Questionnaire & Opinion Survey (1.00)
- Instructional Material (1.00)
- Personal > Interview (0.66)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)
- Education > Educational Technology > Educational Software > Computer Based Training (0.46)
- Education > Educational Setting > Online (0.46)