medical professional
AI deepfakes of real doctors spreading health misinformation on social media
An investigation found that real video of medical professionals is being manipulated using AI. An investigation found that real video of medical professionals is being manipulated using AI. TikTok and other social media platforms are hosting AI-generated deepfake videos of doctors whose words have been manipulated to help sell supplements and spread health misinformation. The factchecking organisation Full Fact has uncovered hundreds of such videos featuring impersonated versions of doctors and influencers directing viewers to Wellness Nest, a US-based supplements firm. All the deepfakes involve real footage of a health expert taken from the internet.
- North America > United States (0.15)
- Europe > United Kingdom > Wales (0.05)
- Europe > United Kingdom > Scotland (0.05)
- (4 more...)
- Media > News (1.00)
- Health & Medicine (1.00)
- Government > Regional Government (1.00)
- Information Technology > Security & Privacy (0.89)
REACT-LLM: A Benchmark for Evaluating LLM Integration with Causal Features in Clinical Prognostic Tasks
Wang, Linna, You, Zhixuan, Zhang, Qihui, Wen, Jiunan, Shi, Ji, Chen, Yimin, Wang, Yusen, Ding, Fanqi, Feng, Ziliang, Lu, Li
Large Language Models (LLMs) and causal learning each hold strong potential for clinical decision making (CDM). However, their synergy remains poorly understood, largely due to the lack of systematic benchmarks evaluating their integration in clinical risk prediction. In real-world healthcare, identifying features with causal influence on outcomes is crucial for actionable and trustworthy predictions. While recent work highlights LLMs' emerging causal reasoning abilities, there lacks comprehensive benchmarks to assess their causal learning and performance informed by causal features in clinical risk prediction. To address this, we introduce REACT-LLM, a benchmark designed to evaluate whether combining LLMs with causal features can enhance clinical prognostic performance and potentially outperform traditional machine learning (ML) methods. Unlike existing LLM-clinical benchmarks that often focus on a limited set of outcomes, REACT-LLM evaluates 7 clinical outcomes across 2 real-world datasets, comparing 15 prominent LLMs, 6 traditional ML models, and 3 causal discovery (CD) algorithms. Our findings indicate that while LLMs perform reasonably in clinical prognostics, they have not yet outperformed traditional ML models. Integrating causal features derived from CD algorithms into LLMs offers limited performance gains, primarily due to the strict assumptions of many CD methods, which are often violated in complex clinical data. While the direct integration yields limited improvement, our benchmark reveals a more promising synergy.
- Asia > Middle East > Israel (0.04)
- Asia > China > Yunnan Province > Kunming (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
REMONI: An Autonomous System Integrating Wearables and Multimodal Large Language Models for Enhanced Remote Health Monitoring
Ho, Thanh Cong, Kharrat, Farah, Abid, Abderrazek, Karray, Fakhri
With the widespread adoption of wearable devices in our daily lives, the demand and appeal for remote patient monitoring have significantly increased. Most research in this field has concentrated on collecting sensor data, visualizing it, and analyzing it to detect anomalies in specific diseases such as diabetes, heart disease and depression. However, this domain has a notable gap in the aspect of human-machine interaction. This paper proposes REMONI, an autonomous REmote health MONItoring system that integrates multimodal large language models (MLLMs), the Internet of Things (IoT), and wearable devices. The system automatically and continuously collects vital signs, accelerometer data from a special wearable (such as a smartwatch), and visual data in patient video clips collected from cameras. This data is processed by an anomaly detection module, which includes a fall detection model and algorithms to identify and alert caregivers of the patient's emergency conditions. A distinctive feature of our proposed system is the natural language processing component, developed with MLLMs capable of detecting and recognizing a patient's activity and emotion while responding to healthcare worker's inquiries. Additionally, prompt engineering is employed to integrate all patient information seamlessly. As a result, doctors and nurses can access real-time vital signs and the patient's current state and mood by interacting with an intelligent agent through a user-friendly web application. Our experiments demonstrate that our system is implementable and scalable for real-life scenarios, potentially reducing the workload of medical professionals and healthcare costs. A full-fledged prototype illustrating the functionalities of the system has been developed and being tested to demonstrate the robustness of its various capabilities.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- Health & Medicine > Consumer Health (1.00)
- Health & Medicine > Diagnostic Medicine > Vital Signs (0.90)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)
Dr. GPT Will See You Now, but Should It? Exploring the Benefits and Harms of Large Language Models in Medical Diagnosis using Crowdsourced Clinical Cases
Mingole, Bonam, Majumdar, Aditya, Choudhury, Firdaus Ahmed, Kraschnewski, Jennifer L., Sundar, Shyam S., Yadav, Amulya
The proliferation of Large Language Models (LLMs) in high-stakes applications such as medical (self-)diagnosis and preliminary triage raises significant ethical and practical concerns about the effectiveness, appropriateness, and possible harmfulness of the use of these technologies for health-related concerns and queries. Some prior work has considered the effectiveness of LLMs in answering expert-written health queries/prompts, questions from medical examination banks, or queries based on pre-existing clinical cases. Unfortunately, these existing studies completely ignore an in-the-wild evaluation of the effectiveness of LLMs in answering everyday health concerns and queries typically asked by general users, which corresponds to the more prevalent use case for LLMs. To address this research gap, this paper presents the findings from a university-level competition that leveraged a novel, crowdsourced approach for evaluating the effectiveness of LLMs in answering everyday health queries. Over the course of a week, a total of 34 participants prompted four publicly accessible LLMs with 212 real (or imagined) health concerns, and the LLM generated responses were evaluated by a team of nine board-certified physicians. At a high level, our findings indicate that on average, 76% of the 212 LLM responses were deemed to be accurate by physicians. Further, with the help of medical professionals, we investigated whether RAG versions of these LLMs (powered with a comprehensive medical knowledge base) can improve the quality of responses generated by LLMs. Finally, we also derive qualitative insights to explain our quantitative findings by conducting interviews with seven medical professionals who were shown all the prompts in our competition. This paper aims to provide a more grounded understanding of how LLMs perform in real-world everyday health communication.
- Europe > Austria > Vienna (0.14)
- North America > United States > Pennsylvania (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
The future of Apple Vision Pro is in medicine
Apple's 3,500 Vision Pro sounds like a bargain compared to the price of a fresh, medical-grade cadaver. And some medical institutions have started practicing surgery using the spatial-computing headset, which doesn't require a physical human body. Replacing cadavers is just one example of how the Vision Pro has made its way into the medical field since it hit the market in February 2024. On January 30-31, 2025, Sharp Healthcare hosted the inaugural Spatial Computing Health Care Summit, where medical providers gathered to discuss their use of spatial computing, which embeds digital objects into a live feed of the real world. The same tech that allows people to play virtual Battleship with each other has moved into applications that include everything from training and education to full-fledged operations on human patients.
- Health & Medicine > Surgery (0.52)
- Education > Curriculum > Subject-Specific Education (0.36)
The Doctor Behind the 'Suicide Pod' Wants AI to Assist at the End of Life
The world's first assisted suicide pod wraps around the human body like a space capsule, tilting gently towards the sky. The device is designed to look as if the person inside is embarking on a journey, says its inventor, the Australian right-to-die activist Philip Nitschke. "It gives you the idea you're saying goodbye to the world." Last month, the 3D-printed pod was used for the first time. In a forest on the Swiss-German border, an unnamed 64-year-old American woman pressed the pod's button to release deadly nitrogen gas.
Towards Leveraging Large Language Models for Automated Medical Q&A Evaluation
Krolik, Jack, Mahal, Herprit, Ahmad, Feroz, Trivedi, Gaurav, Saket, Bahador
This paper explores the potential of using Large Language Models (LLMs) to automate the evaluation of responses in medical Question and Answer (Q\&A) systems, a crucial form of Natural Language Processing. Traditionally, human evaluation has been indispensable for assessing the quality of these responses. However, manual evaluation by medical professionals is time-consuming and costly. Our study examines whether LLMs can reliably replicate human evaluations by using questions derived from patient data, thereby saving valuable time for medical experts. While the findings suggest promising results, further research is needed to address more specific or complex questions that were beyond the scope of this initial investigation.
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.48)
Case-based reasoning approach for diagnostic screening of children with developmental delays
Song, Zichen, Li, Jiakang, Lai, Songning, Huang, Sitan
According to the World Health Organization, the population of children with developmental delays constitutes approximately 6% to 9% of the total population. Based on the number of newborns in Huaibei, Anhui Province, China, in 2023 (94,420), it is estimated that there are about 7,500 cases (suspected cases of developmental delays) of suspicious cases annually. Early identification and appropriate early intervention for these children can significantly reduce the wastage of medical resources and societal costs. International research indicates that the optimal period for intervention in children with developmental delays is before the age of six, with the golden treatment period being before three and a half years of age. Studies have shown that children with developmental delays who receive early intervention exhibit significant improvement in symptoms; some may even fully recover. This research adopts a hybrid model combining a CNN-Transformer model with Case-Based Reasoning (CBR) to enhance the screening efficiency for children with developmental delays. The CNN-Transformer model is an excellent model for image feature extraction and recognition, effectively identifying features in bone age images to determine bone age. CBR is a technique for solving problems based on similar cases; it solves current problems based on past experiences, similar to how humans solve problems through learning from experience. Given CBR's memory capability to judge and compare new cases based on previously stored old cases, it is suitable for application in support systems with latent and variable characteristics. Therefore, this study utilizes the CNN-Transformer-CBR to establish a screening system for children with developmental delays, aiming to improve screening efficiency.
- Asia > China > Anhui Province (0.24)
- Asia > Taiwan (0.05)
- North America > United States > Oregon > Lane County > Eugene (0.04)
- North America > United States > California (0.04)
Evaluating the Explainable AI Method Grad-CAM for Breath Classification on Newborn Time Series Data
Oprea, Camelia, Grüne, Mike, Buglowski, Mateusz, Olivier, Lena, Orlikowsky, Thorsten, Kowalewski, Stefan, Schoberer, Mark, Stollenwerk, André
With the digitalization of health care systems, artificial intelligence becomes more present in medicine. Especially machine learning shows great potential for complex tasks such as time series classification, usually at the cost of transparency and comprehensibility. This leads to a lack of trust by humans and thus hinders its active usage. Explainable artificial intelligence tries to close this gap by providing insight into the decision-making process, the actual usefulness of its different methods is however unclear. This paper proposes a user study based evaluation of the explanation method Grad-CAM with application to a neural network for the classification of breaths in time series neonatal ventilation data. We present the perceived usefulness of the explainability method by different stakeholders, exposing the difficulty to achieve actual transparency and the wish for more in-depth explanations by many of the participants.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
- Questionnaire & Opinion Survey (0.70)
- Research Report (0.64)
Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis
Xu, Shaochen, Wu, Zihao, Zhao, Huaqin, Shu, Peng, Liu, Zhengliang, Liao, Wenxiong, Li, Sheng, Sikora, Andrea, Liu, Tianming, Li, Xiang
The analysis of medical texts is a key component of healthcare informatics, where the accurate comparison and interpretation of documents can significantly impact patient care and medical research. Traditionally, this analysis has leveraged lexical comparison metrics such as ROUGE (Recall-Oriented Understudy for Gisting Evaluation) [1] and BLEU (Bilingual Evaluation Understudy) [2], which have become standard tools in the evaluation of text similarity within the domain of natural language processing (NLP). ROUGE and BLEU were initially designed to assess the quality of automatic summarization and machine translation respectively, by measuring the overlap of n-grams between the generated texts and reference texts. While these metrics have been instrumental in advancing NLP applications, their application in medical text analysis reveals inherent limitations. Specifically, ROUGE and BLEU focus predominantly on surface-level lexical similarities, often overlooking the deep semantic meanings and clinical implications embedded within medical documents. This gap in capturing the essence and context of medical language presents a significant challenge in leveraging these metrics for meaningful analysis in healthcare. Recognizing these limitations, this research proposes a novel methodology that employs GPT-4, a state-of-the-art large language model, for a more sophisticated analysis of medical texts. GPT-4's advanced understanding of context and semantics [3, 4, 5] offers an opportunity to transcend the boundaries of traditional lexical analysis, enabling a deeper, more meaningful comparison of medical documents [6, 7]. This approach not only addresses the shortcomings of ROUGE and BLEU but also aligns with the evolving needs of medical data analysis, where the accurate interpretation of texts is preeminent.
- North America > United States > Virginia (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Massachusetts (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)