Goto

Collaborating Authors

 healthcare system


An AI Implementation Science Study to Improve Trustworthy Data in a Large Healthcare System

Marteau, Benoit L., Hornback, Andrew, Tan, Shaun Q., Lowson, Christian, Woloff, Jason, Wang, May D.

arXiv.org Artificial Intelligence

The rapid growth of Artificial Intelligence (AI) in healthcare has sparked interest in Trustworthy AI and AI Implementation Science, both of which are essential for accelerating clinical adoption. However, strict regulations, gaps between research and clinical settings, and challenges in evaluating AI systems continue to hinder real-world implementation. This study presents an AI implementation case study within Shriners Childrens (SC), a large multisite pediatric system, showcasing the modernization of SCs Research Data Warehouse (RDW) to OMOP CDM v5.4 within a secure Microsoft Fabric environment. We introduce a Python-based data quality assessment tool compatible with SCs infrastructure, extending OHDsi's R/Java-based Data Quality Dashboard (DQD) and integrating Trustworthy AI principles using the METRIC framework. This extension enhances data quality evaluation by addressing informative missingness, redundancy, timeliness, and distributional consistency. We also compare systematic and case-specific AI implementation strategies for Craniofacial Microsomia (CFM) using the FHIR standard. Our contributions include a real-world evaluation of AI implementations, integration of Trustworthy AI principles into data quality assessment, and insights into hybrid implementation strategies that blend systematic infrastructure with use-case-driven approaches to advance AI in healthcare.


Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs

D'addario, Andrew Maranhão Ventura

arXiv.org Artificial Intelligence

The integration of Large Language Models (LLMs) into healthcare demands a safety paradigm rooted in \textit{primum non nocere}. However, current alignment techniques rely on generic definitions of harm that fail to capture context-dependent violations, such as administrative fraud and clinical discrimination. To address this, we introduce Medical Malice: a dataset of 214,219 adversarial prompts calibrated to the regulatory and ethical complexities of the Brazilian Unified Health System (SUS). Crucially, the dataset includes the reasoning behind each violation, enabling models to internalize ethical boundaries rather than merely memorizing a fixed set of refusals. Using an unaligned agent (Grok-4) within a persona-driven pipeline, we synthesized high-fidelity threats across seven taxonomies, ranging from procurement manipulation and queue-jumping to obstetric violence. We discuss the ethical design of releasing these "vulnerability signatures" to correct the information asymmetry between malicious actors and AI developers. Ultimately, this work advocates for a shift from universal to context-aware safety, providing the necessary resources to immunize healthcare AI against the nuanced, systemic threats inherent to high-stakes medical environments -- vulnerabilities that represent the paramount risk to patient safety and the successful integration of AI in healthcare systems.


Topic-aware Large Language Models for Summarizing the Lived Healthcare Experiences Described in Health Stories

Bilalpur, Maneesh, Hamm, Megan, Lee, Young Ji, Norman, Natasha, McTigue, Kathleen M., Wang, Yanshan

arXiv.org Artificial Intelligence

Storytelling is a powerful form of communication and may provide insights into factors contributing to gaps in healthcare outcomes. To determine whether Large Language Models (LLMs) can identify potential underlying factors and avenues for intervention, we performed topic-aware hierarchical summarization of narratives from African American (AA) storytellers. Fifty transcribed stories of AA experiences were used to identify topics in their experience using the Latent Dirichlet Allocation (LDA) technique. Stories about a given topic were summarized using an open-source LLM-based hierarchical summarization approach. Topic summaries were generated by summarizing across story summaries for each story that addressed a given topic. Generated topic summaries were rated for fabrication, accuracy, comprehensiveness, and usefulness by the GPT4 model, and the model's reliability was validated against the original story summaries by two domain experts. 26 topics were identified in the fifty AA stories. The GPT4 ratings suggest that topic summaries were free from fabrication, highly accurate, comprehensive, and useful. The reliability of GPT ratings compared to expert assessments showed moderate to high agreement. Our approach identified AA experience-relevant topics such as health behaviors, interactions with medical team members, caregiving and symptom management, among others. Such insights could help researchers identify potential factors and interventions by learning from unstructured narratives in an efficient manner-leveraging the communicative power of storytelling. The use of LDA and LLMs to identify and summarize the experience of AA individuals suggests a variety of possible avenues for health research and possible clinical improvements to support patients and caregivers, thereby ultimately improving health outcomes.


'DeepSeek is humane. Doctors are more like machines': my mother's worrying reliance on AI for health advice

The Guardian

Doctors are more like machines': my mother's worrying reliance on AI for health advice Tired of a two-day commute to see her overworked doctor, my mother turned to tech for help with her kidney disease. E very few months, my mother, a 57-year-old kidney transplant patient who lives in a small city in eastern China, embarks on a two-day journey to see her doctor. She fills her backpack with a change of clothes, a stack of medical reports and a few boiled eggs to snack on. Then, she takes a 90-minute ride on a high-speed train and checks into a hotel in the eastern metropolis of Hangzhou. At 7am the next day, she lines up with hundreds of others to get her blood taken in a long hospital hall that buzzes like a crowded marketplace. In the afternoon, when the lab results arrive, she makes her way to a specialist's clinic. She gets about three minutes with the doctor. Then, my mother packs up and starts the long commute home. My mother began using China's leading AI chatbot to diagnose her symptoms this past winter. She would lie down on her couch and open the app on her iPhone. "Hi," she said in her first message to the chatbot, on 2 February. How can I assist you today?" the system responded instantly, adding a smiley emoji.


Intercept Cancer: Cancer Pre-Screening with Large Scale Healthcare Foundation Models

Sun, Liwen, Yao, Hao-Ren, Gao, Gary, Frieder, Ophir, Xiong, Chenyan

arXiv.org Artificial Intelligence

Cancer screening, leading to early detection, saves lives. Unfortunately, existing screening techniques require expensive and intrusive medical procedures, not globally available, resulting in too many lost would-be-saved lives. We present CATCH-FM, CATch Cancer early with Healthcare Foundation Models, a cancer pre-screening methodology that identifies high-risk patients for further screening solely based on their historical medical records. With millions of electronic healthcare records (EHR), we establish the scaling law of EHR foundation models pretrained on medical code sequences, pretrain compute-optimal foundation models of up to 2.4 billion parameters, and finetune them on clinician-curated cancer risk prediction cohorts. In our retrospective evaluation comprising of thirty thousand patients, CATCH-FM achieves strong efficacy, with 50% sensitivity in predicting first cancer risks at 99% specificity cutoff, and outperforming feature-based tree models and both general and medical LLMs by up to 20% AUPRC. Despite significant demographic, healthcare system, and EHR coding differences, CATCH-FM achieves state-of-the-art pancreatic cancer risk prediction on the EHRSHOT few-shot leaderboard, outperforming EHR foundation models pretrained using on-site patient data. Our analysis demonstrates the robustness of CATCH-FM in various patient distributions, the benefits of operating in the ICD code space, and its ability to capture non-trivial cancer risk factors. Our code will be open-sourced.


Beyond Linear Steering: Unified Multi-Attribute Control for Language Models

Oozeer, Narmeen, Marks, Luke, Barez, Fazl, Abdullah, Amirali

arXiv.org Artificial Intelligence

Controlling multiple behavioral attributes in large language models (LLMs) at inference time is a challenging problem due to interference between attributes and the limitations of linear steering methods, which assume additive behavior in activation space and require per-attribute tuning. We introduce K-Steering, a unified and flexible approach that trains a single non-linear multi-label classifier on hidden activations and computes intervention directions via gradients at inference time. This avoids linearity assumptions, removes the need for storing and tuning separate attribute vectors, and allows dynamic composition of behaviors without retraining. To evaluate our method, we propose two new benchmarks, ToneBank and DebateMix, targeting compositional behavioral control. Empirical results across 3 model families, validated by both activation-based classifiers and LLM-based judges, demonstrate that K-Steering outperforms strong baselines in accurately steering multiple behaviors.


The Collaborations among Healthcare Systems, Research Institutions, and Industry on Artificial Intelligence Research and Development

Ye, Jiancheng, Ma, Michelle, Abuhashish, Malak

arXiv.org Artificial Intelligence

Objectives: The integration of Artificial Intelligence (AI) in healthcare promises to revolutionize patient care, diagnostics, and treatment protocols. Collaborative efforts among healthcare systems, research institutions, and industry are pivotal to leveraging AI's full potential. This study aims to characterize collaborative networks and stakeholders in AI healthcare initiatives, identify challenges and opportunities within these collaborations, and elucidate priorities for future AI research and development. Methods: This study utilized data from the Chinese Society of Radiology and the Chinese Medical Imaging AI Innovation Alliance. A national cross-sectional survey was conducted in China (N = 5,142) across 31 provincial administrative regions, involving participants from three key groups: clinicians, institution professionals, and industry representatives. The survey explored diverse aspects including current AI usage in healthcare, collaboration dynamics, challenges encountered, and research and development priorities. Results: Findings reveal high interest in AI among clinicians, with a significant gap between interest and actual engagement in development activities. Despite the willingness to share data, progress is hindered by concerns about data privacy and security, and lack of clear industry standards and legal guidelines. Future development interests focus on lesion screening, disease diagnosis, and enhancing clinical workflows. Conclusion: This study highlights an enthusiastic yet cautious approach toward AI in healthcare, characterized by significant barriers that impede effective collaboration and implementation. Recommendations emphasize the need for AI-specific education and training, secure data-sharing frameworks, establishment of clear industry standards, and formation of dedicated AI research departments.


AI for Senior Citizens

Communications of the ACM

We are now living longer, and the number of people worldwide aged 65 and over is expected to grow from 703 million in 2019 to 2.2 billion in 2080, according to the World Population Prospects Report published by the United Nations last year. The proportion of the global population that is elderly is also on the rise, almost doubling from 5.5% in 1974 to 10.3% last year, and it is projected to grow to 20.7% by 2074. A consequence of aging is that we are more likely to have medical problems. At the same time, the healthcare system in many countries is already stretched due to a lack of workers. "There are just not enough doctors and nurses to deal with a growing elderly population," said Massimiliano Zecca, a professor of healthcare technology at Loughborough University in the U.K. In the U.S, for example, a severe shortage of doctors is expected by 2034, with between 37,800 and 124,000 physicians lacking, partly fueled by the growing number of seniors, according to a recent report by the Association of American Medical Colleges (AAMC).


Trump to unveil new MAHA initiatives at 'Make Health Tech Great Again' White House event

FOX News

Trump is expected to roll out a DOGE-backed plan to "encourage more seamless sharing of health-care data" between states and the federal government. The White House is poised to unveil new details on Wednesday surrounding the Trump administration's efforts to advance healthcare technology and partnerships with private-sector technology companies. The "Make Health Tech Great Again" event is expected to provide more details on how the administration is advancing a "next-generation digital health ecosystem," after securing partnerships with companies including Amazon, Anthropic, Apple, Google, and OpenAI to better share information between patient and providers within Medicare and Medicaid services. U.S. Health and Human Services Secretary Robert F. Kennedy Jr., announced that the HHS will ban illegal immigrants from accessing taxpayer-funded programs. "For decades, bureaucrats and entrenched interests buried health data and blocked patients from taking control of their health," Department of Health and Human Services Secretary Robert F. Kennedy, Jr. said in a statement Wednesday ahead of the event.


Voice-based AI Agents: Filling the Economic Gaps in Digital Health Delivery

Wen, Bo, Wang, Chen, Han, Qiwei, Norel, Raquel, Liu, Julia, Stappenbeck, Thaddeus, Rogers, Jeffrey L.

arXiv.org Artificial Intelligence

--The integration of voice-based AI agents in healthcare presents a transformative opportunity to bridge economic and accessibility gaps in digital health delivery. This paper explores the role of large language model (LLM)-powered voice assistants in enhancing preventive care and continuous patient monitoring, particularly in underserved populations. Drawing insights from the development and pilot study of Agent PULSE (Patient Understanding and Liaison Support Engine)--a collaborative initiative between IBM Research, Cleveland Clinic Foundation, and Morehouse School of Medicine--we present an economic model demonstrating how AI agents can provide cost-effective healthcare services where human intervention is economically unfeasible. Our pilot study with 33 inflammatory bowel disease patients revealed that 70% expressed acceptance of AI-driven monitoring, with 37% preferring it over traditional modalities. T echnical challenges, including real-time conversational AI processing, integration with healthcare systems, and privacy compliance, are analyzed alongside policy considerations surrounding regulation, bias mitigation, and patient autonomy. Our findings suggest that AI-driven voice agents not only enhance healthcare scalability and efficiency but also improve patient engagement and accessibility. For healthcare executives, our cost-utility analysis demonstrates huge potential savings for routine monitoring tasks, while technologists can leverage our framework to prioritize improvements yielding the highest patient impact. By addressing current limitations and aligning AI development with ethical and regulatory frameworks, voice-based AI agents can serve as a critical entry point for equitable, sustainable digital healthcare solutions. Healthcare systems worldwide face growing challenges in allocating limited medical resources to meet increasing demand [1], [2]. Traditional healthcare delivery models, centered on episodic patient-provider interactions, often result in significant gaps in continuous care, particularly in preventive health monitoring and chronic disease management [2], [3]. These shortcomings disproportionately affect vulnerable populations, including those with limited access to healthcare facilities [4], lower technological literacy [5], or socio-economic constraints [6]. The advent of Large Language Models (LLMs) and multi-modal AI has opened new avenues for digital health applications [7]-[10], notably in voice-based patient engagement [11], [12]. Unlike earlier rule-based conversational agents, modern AI-driven voice assistants can facilitate context-aware, adaptive, and natural conversations that dynamically adjust to user preferences, health literacy levels, and immediate needs [13]. V oice, as humanity's most intuitive mode of communication, reduces engagement barriers and broadens access to healthcare, especially for underserved communities [12], [14].