FDA
Fox News AI Newsletter: FDA approves cancer-fighting tech tool
Senior medical analyst Dr. Marc Siegel discusses advancements in artificial intelligence aimed at predicting an individuals future risk of breast cancer and the increased health risks from cannabis as users age. SMARTER SCREENINGS: The U.S. Food and Drug Administration (FDA) has approved the first artificial intelligence (AI) tool to predict breast cancer risk. NOVA IN ACTION: Flock Safety has released another piece of revolutionary technology aimed at keeping everyday civilians safe from crime. The company's new product, Flock Nova, helps law enforcement with a common but often overlooked problem – a lack of data sharing and access. ROBOT NURSES RISING: The global healthcare system is expected to face a shortage of 4.5 million nurses by 2030, with burnout identified as a leading cause for this deficit.
FDA approves first AI tool to predict breast cancer risk
Senior medical analyst Dr. Marc Siegel discusses advancements in artificial intelligence aimed at predicting an individual's future risk of breast cancer and the increased health risks from cannabis as users age. The U.S. Food and Drug Administration (FDA) has approved the first artificial intelligence (AI) tool to predict breast cancer risk. The authorization was confirmed by digital health tech company Clairity, the developer of Clairity Breast – a novel, image-based prognostic platform designed to predict five-year breast cancer risk from a routine screening mammogram. In a press release, Clairity shared its plans to launch the AI platform across health systems through 2025. Most risk assessment models for breast cancer rely heavily on age and family history, according to Clairity.
Recent Developments in GNNs for Drug Discovery
Fang, Zhengyu, Zhang, Xiaoge, Zhao, Anyin, Li, Xiao, Chen, Huiyuan, Li, Jing
It is well known that traditional drug discovery is costly, time-consuming, and with high failure rates [1]. To streamline the process of drug discovery and mitigate resource-intensive laboratory work, significant research has been dedicated to the development of computational methods. Existing literature provides some comprehensive reviews on deep learning approaches in drug discovery [2, 3, 4, 5]. In this review, we focus on the development and applications of Graph Neural Networks (GNNs) on three related areas of computational drug development, namely, Molecule Generation, Molecular Property Prediction, and Drug-Drug Interaction Prediction, which not only receive increasing attention but also show promising results. We will summarize some most recent developments in these research areas and focus on computational advances published since 2021.
Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting
Huang, Chen, Seto, Skyler, Pouransari, Hadi, Farajtabar, Mehrdad, Vemulapalli, Raviteja, Faghri, Fartash, Tuzel, Oncel, Theobald, Barry-John, Susskind, Josh
Vision foundation models pre-trained on massive data encode rich representations of real-world concepts, which can be adapted to downstream tasks by fine-tuning. However, fine-tuning foundation models on one task often leads to the issue of concept forgetting on other tasks. Recent methods of robust fine-tuning aim to mitigate forgetting of prior knowledge without affecting the fine-tuning performance. Knowledge is often preserved by matching the original and fine-tuned model weights or feature pairs. However, such point-wise matching can be too strong, without explicit awareness of the feature neighborhood structures that encode rich knowledge as well. We propose a novel regularization method Proxy-FDA that explicitly preserves the structural knowledge in feature space. Proxy-FDA performs Feature Distribution Alignment (using nearest neighbor graphs) between the pre-trained and fine-tuned feature spaces, and the alignment is further improved by informative proxies that are generated dynamically to increase data diversity. Experiments show that Proxy-FDA significantly reduces concept forgetting during fine-tuning, and we find a strong correlation between forgetting and a distributional distance metric (in comparison to L2 distance). We further demonstrate Proxy-FDA's benefits in various fine-tuning settings (end-to-end, few-shot and continual tuning) and across different tasks like image classification, captioning and VQA.
Can Large Language Models Match the Conclusions of Systematic Reviews?
Polzak, Christopher, Lozano, Alejandro, Sun, Min Woo, Burgess, James, Zhang, Yuhui, Wu, Kevin, Yeung-Levy, Serena
Systematic reviews (SR), in which experts summarize and analyze evidence across individual studies to provide insights on a specialized topic, are a cornerstone for evidence-based clinical decision-making, research, and policy. Given the exponential growth of scientific articles, there is growing interest in using large language models (LLMs) to automate SR generation. However, the ability of LLMs to critically assess evidence and reason across multiple documents to provide recommendations at the same proficiency as domain experts remains poorly characterized. We therefore ask: Can LLMs match the conclusions of systematic reviews written by clinical experts when given access to the same studies? To explore this question, we present MedEvidence, a benchmark pairing findings from 100 SRs with the studies they are based on. We benchmark 24 LLMs on MedEvidence, including reasoning, non-reasoning, medical specialist, and models across varying sizes (from 7B-700B). Through our systematic evaluation, we find that reasoning does not necessarily improve performance, larger models do not consistently yield greater gains, and knowledge-based fine-tuning degrades accuracy on MedEvidence. Instead, most models exhibit similar behavior: performance tends to degrade as token length increases, their responses show overconfidence, and, contrary to human experts, all models show a lack of scientific skepticism toward low-quality findings. These results suggest that more work is still required before LLMs can reliably match the observations from expert-conducted SRs, even though these systems are already deployed and being used by clinicians. We release our codebase and benchmark to the broader research community to further investigate LLM-based SR systems.
UniTox: Leveraging LLMs to Curate a Unified Dataset of Drug-Induced Toxicity from FDA Labels
Drug-induced toxicity is one of the leading reasons new drugs fail clinical trials. Machine learning models that predict drug toxicity from molecular structure could help researchers prioritize less toxic drug candidates. However, current toxicity datasets are typically small and limited to a single organ system (e.g., cardio, renal, or liver). Creating these datasets often involved time-intensive expert curation by parsing drug labelling documents that can exceed 100 pages per drug. Here, we introduce UniTox, a unified dataset of 2,418 FDA-approved drugs with drug-induced toxicity summaries and ratings created by using GPT-4o to process FDA drug labels.
AI exoskeleton gives wheelchair users the freedom to walk again
Wandercraft's Personal Exoskeleton is about helping people stand tall, connect with others and live life on their own terms. For Caroline Laubach, being a Wandercraft test pilot is about more than just trying out new technology. It's about reclaiming a sense of freedom and connection that many wheelchair users miss. Laubach, a spinal stroke survivor and full-time wheelchair user, has played a key role in demonstrating the personal AI-powered prototype exoskeleton's development, and her experience highlights just how life-changing this device can be. "When I'm in the exoskeleton, I feel more free than I do in my daily life," said Laubach.
Heart2Mind: Human-Centered Contestable Psychiatric Disorder Diagnosis System using Wearable ECG Monitors
Nguyen, Hung, Rahimi, Alireza, Whitford, Veronica, Fournier, Hélène, Kondratova, Irina, Richard, René, Cao, Hung
Psychiatric disorders affect millions globally, yet their diagnosis faces significant challenges in clinical practice due to subjective assessments and accessibility concerns, leading to potential delays in treatment. To help address this issue, we present Heart2Mind, a human-centered contestable psychiatric disorder diagnosis system using wearable electrocardiogram (ECG) monitors. Our approach leverages cardiac biomarkers, particularly heart rate variability (HRV) and R-R intervals (RRI) time series, as objective indicators of autonomic dysfunction in psychiatric conditions. The system comprises three key components: (1) a Cardiac Monitoring Interface (CMI) for real-time data acquisition from Polar H9/H10 devices; (2) a Multi-Scale Temporal-Frequency Transformer (MSTFT) that processes RRI time series through integrated time-frequency domain analysis; (3) a Contestable Diagnosis Interface (CDI) combining Self-Adversarial Explanations (SAEs) with contestable Large Language Models (LLMs). Our MSTFT achieves 91.7% accuracy on the HRV-ACC dataset using leave-one-out cross-validation, outperforming state-of-the-art methods. SAEs successfully detect inconsistencies in model predictions by comparing attention-based and gradient-based explanations, while LLMs enable clinicians to validate correct predictions and contest erroneous ones. This work demonstrates the feasibility of combining wearable technology with Explainable Artificial Intelligence (XAI) and contestable LLMs to create a transparent, contestable system for psychiatric diagnosis that maintains clinical oversight while leveraging advanced AI capabilities. Our implementation is publicly available at: https://github.com/Analytics-Everywhere-Lab/heart2mind.
Semantic Similarity-Informed Bayesian Borrowing for Quantitative Signal Detection of Adverse Events
Haguinet, François, Painter, Jeffery L, Powell, Gregory E, Callegaro, Andrea, Bate, Andrew
We present a Bayesian dynamic borrowing (BDB) approach to enhance the quantitative identification of adverse events (AEs) in spontaneous reporting systems (SRSs). The method embeds a robust meta-analytic predictive (MAP) prior with a Bayesian hierarchical model and incorporates semantic similarity measures (SSMs) to enable weighted information sharing from clinically similar MedDRA Preferred Terms (PTs) to the target PT. This continuous similarity-based borrowing overcomes limitations of rigid hierarchical grouping in current disproportionality analysis (DPA). Using data from the FDA Adverse Event Reporting System (FAERS) between 2015 and 2019, we evaluate our approach -- termed IC SSM -- against traditional Information Component (IC) analysis and IC with borrowing at the MedDRA high-level group term level (IC HLGT). A reference set (PVLens), derived from FDA product label update, enabled prospective evaluation of method performance in identifying AEs prior to official labeling. The IC SSM approach demonstrated higher sensitivity (1332/2337=0.570, Youden's J=0.246) than traditional IC (Se=0.501, J=0.250) and IC HLGT (Se=0.556, J=0.225), consistently identifying more true positives and doing so on average 5 months sooner than traditional IC. Despite a marginally lower aggregate F1-score and Youden's index, IC SSM showed higher performance in early post-marketing periods or when the detection threshold was raised, providing more stable and relevant alerts than IC HLGT and traditional IC. These findings support the use of SSM-informed Bayesian borrowing as a scalable and context-aware enhancement to traditional DPA methods, with potential for validation across other datasets and exploration of additional similarity metrics and Bayesian strategies using case-level data.
Automated Real-time Assessment of Intracranial Hemorrhage Detection AI Using an Ensembled Monitoring Model (EMM)
Fang, Zhongnan, Johnston, Andrew, Cheuy, Lina, Na, Hye Sun, Paschali, Magdalini, Gonzalez, Camila, Armstrong, Bonnie A., Koirala, Arogya, Laurel, Derrick, Campion, Andrew Walker, Iv, Michael, Chaudhari, Akshay S., Larson, David B.
A rtificial intelligence (AI) tools for radiology are commonly unmonitored once deployed . Th e lack of real - time case - by - c ase assessments of AI prediction confidence require s users to independently distinguish between trustworthy and unreliable AI predictions, which increas es cognitive burden, r educ es productivity, and potentially lead s to misdiagnos e s. To address these challenges, we introduce Ensembled Monitoring Model (EMM), a framework inspired by clinical consensus practices using multiple expert reviews. Designed specifically for black - box commercial AI products, EMM operates independently without requiring access to interna l AI components or intermediate outputs, while still providing robust confidence measurements. Using intracranial hemorrhage detection as our test case on a large, diverse dataset of 2919 studies, we demonstrate that EMM successfully categorizes confidence in the AI - generated prediction, suggesting different actions and helping improve the overall performance of AI tools to ultimately reduc e cognitive burden . Importantly, we provide key technical considerations and best practices for successfully translating EMM into clinical settings .