Goto

Collaborating Authors

 asthma


Enhancing Phenotype Discovery in Electronic Health Records through Prior Knowledge-Guided Unsupervised Learning

Mayer, Melanie, Lactaoen, Kimberly, Weissman, Gary E., Himes, Blanca E., Hubbard, Rebecca A.

arXiv.org Machine Learning

Objectives: Unsupervised learning with electronic health record (EHR) data has shown promise for phenotype discovery, but approaches typically disregard existing clinical information, limiting interpretability. We operationalize a Bayesian latent class framework for phenotyping that incorporates domain-specific knowledge to improve clinical meaningfulness of EHR-derived phenotypes and illustrate its utility by identifying an asthma sub-phenotype informed by features of Type 2 (T2) inflammation. Materials and methods: We illustrate a framework for incorporating clinical knowledge into a Bayesian latent class model via informative priors to guide unsupervised clustering toward clinically relevant subgroups. This approach models missingness, accounting for potential missing-not-at-random patterns, and provides patient-level probabilities for phenotype assignment with uncertainty. Using reusable and flexible code, we applied the model to a large asthma EHR cohort, specifying informative priors for T2 inflammation-related features and weakly informative priors for other clinical variables, allowing the data to inform posterior distributions. Results and Conclusion: Using encounter data from January 2017 to February 2024 for 44,642 adult asthma patients, we found a bimodal posterior distribution of phenotype assignment, indicating clear class separation. The T2 inflammation-informed class (38.7%) was characterized by elevated eosinophil levels and allergy markers, plus high healthcare utilization and medication use, despite weakly informative priors on the latter variables. These patterns suggest an "uncontrolled T2-high" sub-phenotype. This demonstrates how our Bayesian latent class modeling approach supports hypothesis generation and cohort identification in EHR-based studies of heterogeneous diseases without well-established phenotype definitions.


The surprising reason why growing up with dogs (and not cats) can be good for your health

Daily Mail - Science & tech

Trump accuses Comey of nearly starting a war as it's revealed why new MAGA star prosecutor rushed indictment Tim Allen reveals Erika Kirk's speech inspired him to forgive his father's killer 60 years after tragic death Girl found dead in D4vd's Tesla was AGED 12 when they met online. Now as masked men guard his mansion, friends unravel the truth... and tell of the chilling moment her texts stopped Someone is trying to drive a wedge between Charles and William. I'm no conspiracy theorist, but even my royal sources say something'calculated' and odd is going on. This is what's really happening, reveals REBECCA ENGLISH The $2 fruit that reverses diabetes... as 100million Americans suffer from deadly condition and most don't know it What would her mother think? Johnny Carson's Malibu home lists for $110m - and it has jaw-dropping hidden feature Selena Gomez and Benny Blanco's FULL wedding plans leaked: Top secret details, surprise celeb host and a MAJOR A-list drop out... ahead of ceremony this weekend Texas man's final words as he is executed for the'exorcism' killing of his girlfriend's 13-month-old daughter Creepy New England road is so isolated it only sees a car every few DAYS.


Vitamin N: Benefits of Different Forms of Public Greenery for Urban Health

Šćepanović, Sanja, Joglekar, Sagar, Law, Stephen, Quercia, Daniele, Zhou, Ke, Battiston, Alice, Schifanella, Rossano

arXiv.org Artificial Intelligence

Urban greenery is often linked to better health, yet findings from past research have been inconsistent. One reason is that official greenery metrics measure the amount or nearness of greenery but ignore how often people actually may potentially see or use it in daily life. To address this gap, we introduced a new classification that separates on-road greenery, which people see while walking through streets, from off-road greenery, which requires planned visits. We did so by combining aerial imagery of Greater London and greenery data from OpenStreetMap with quantified greenery from over 100,000 Google Street View images and accessibility estimates based on 160,000 road segments. We linked these measures to 7.45 billion medical prescriptions issued by the National Health Service and processed through our methodology. These prescriptions cover five conditions: diabetes, hypertension, asthma, depression, and anxiety, as well as opioid use. As hypothesized, we found that green on-road was more strongly linked to better health than four widely used official measures. For example, hypertension prescriptions dropped by 3.68% in wards with on-road greenery above the median citywide level compared to those below it. If all below-median wards reached the citywide median in on-road greenery, prescription costs could fall by up to £3.15 million each year. These results suggest that greenery seen in daily life may be more relevant than public yet secluded greenery, and that official metrics commonly used in the literature have important limitations.


AI-enhanced conversational agents for personalized asthma support Factors for engagement, value and efficacy

Moradbakhti, Laura, Peters, Dorian, Quint, Jennifer K., Schuller, Björn, Cook, Darren, Calvo, Rafael A.

arXiv.org Artificial Intelligence

Asthma-related deaths in the UK are the highest in Europe, and only 30% of patients access basic care. There is a need for alternative approaches to reaching people with asthma in order to provide health education, self-management support and bridges to care. Automated conversational agents (specifically, mobile chatbots) present opportunities for providing alternative and individually tailored access to health education, self-management support and risk self-assessment. But would patients engage with a chatbot, and what factors influence engagement? We present results from a patient survey (N=1257) devised by a team of asthma clinicians, patients, and technology developers, conducted to identify optimal factors for efficacy, value and engagement for a chatbot. Results indicate that most adults with asthma (53%) are interested in using a chatbot and the patients most likely to do so are those who believe their asthma is more serious and who are less confident about self-management. Results also indicate enthusiasm for 24/7 access, personalisation, and for WhatsApp as the preferred access method (compared to app, voice assistant, SMS or website). Obstacles to uptake include security/privacy concerns and skepticism of technological capabilities. We present detailed findings and consolidate these into 7 recommendations for developers for optimising efficacy of chatbot-based health support.


Pediatric Asthma Detection with Googles HeAR Model: An AI-Driven Respiratory Sound Classifier

Ehtesham, Abul, Kumar, Saket, Singh, Aditi, Khoei, Tala Talaei

arXiv.org Artificial Intelligence

Early detection of asthma in children is crucial to prevent long-term respiratory complications and reduce emergency interventions. This work presents an AI-powered diagnostic pipeline that leverages Googles Health Acoustic Representations (HeAR) model to detect early signs of asthma from pediatric respiratory sounds. The SPRSound dataset, the first open-access collection of annotated respiratory sounds in children aged 1 month to 18 years, is used to extract 2-second audio segments labeled as wheeze, crackle, rhonchi, stridor, or normal. Each segment is embedded into a 512-dimensional representation using HeAR, a foundation model pretrained on 300 million health-related audio clips, including 100 million cough sounds. Multiple classifiers, including SVM, Random Forest, and MLP, are trained on these embeddings to distinguish between asthma-indicative and normal sounds. The system achieves over 91\% accuracy, with strong performance on precision-recall metrics for positive cases. In addition to classification, learned embeddings are visualized using PCA, misclassifications are analyzed through waveform playback, and ROC and confusion matrix insights are provided. This method demonstrates that short, low-resource pediatric recordings, when powered by foundation audio models, can enable fast, noninvasive asthma screening. The approach is especially promising for digital diagnostics in remote or underserved healthcare settings.


How Your Location Relates to Health: Variable Importance and Interpretable Machine Learning for Environmental and Sociodemographic Data

Maitra, Ishaan, Lin, Raymond, Chen, Eric, Donnelly, Jon, Šćepanović, Sanja, Rudin, Cynthia

arXiv.org Artificial Intelligence

Health outcomes depend on complex environmental and sociodemographic factors whose effects change over location and time. Only recently has fine-grained spatial and temporal data become available to study these effects, namely the MEDSAT dataset of English health, environmental, and sociodemographic information. Leveraging this new resource, we use a variety of variable importance techniques to robustly identify the most informative predictors across multiple health outcomes. We then develop an interpretable machine learning framework based on Generalized Additive Models (GAMs) and Multiscale Geographically Weighted Regression (MGWR) to analyze both local and global spatial dependencies of each variable on various health outcomes. Our findings identify NO2 as a global predictor for asthma, hypertension, and anxiety, alongside other outcome-specific predictors related to occupation, marriage, and vegetation. Regional analyses reveal local variations with air pollution and solar radiation, with notable shifts during COVID. This comprehensive approach provides actionable insights for addressing health disparities, and advocates for the integration of interpretable machine learning in public health.


LEARNER: A Transfer Learning Method for Low-Rank Matrix Estimation

McGrath, Sean, Zhu, Cenhao, Guo, Min, Duan, Rui

arXiv.org Machine Learning

Low-rank matrix estimation is a fundamental problem in statistics and machine learning. In the context of heterogeneous data generated from diverse sources, a key challenge lies in leveraging data from a source population to enhance the estimation of a low-rank matrix in a target population of interest. One such example is estimating associations between genetic variants and diseases in non-European ancestry groups. We propose an approach that leverages similarity in the latent row and column spaces between the source and target populations to improve estimation in the target population, which we refer to as LatEnt spAce-based tRaNsfer lEaRning (LEARNER). LEARNER is based on performing a low-rank approximation of the target population data which penalizes differences between the latent row and column spaces between the source and target populations. We present a cross-validation approach that allows the method to adapt to the degree of heterogeneity across populations. We conducted extensive simulations which found that LEARNER often outperforms the benchmark approach that only uses the target population data, especially as the signal-to-noise ratio in the source population increases. We also performed an illustrative application and empirical comparison of LEARNER and benchmark approaches in a re-analysis of a genome-wide association study in the BioBank Japan cohort. LEARNER is implemented in the R package learner.


SynSUM -- Synthetic Benchmark with Structured and Unstructured Medical Records

Rabaey, Paloma, Arno, Henri, Heytens, Stefan, Demeester, Thomas

arXiv.org Artificial Intelligence

We present the SynSUM benchmark, a synthetic dataset linking unstructured clinical notes to structured background variables. The dataset consists of 10,000 artificial patient records containing tabular variables (like symptoms, diagnoses and underlying conditions) and related notes describing the fictional patient encounter in the domain of respiratory diseases. The tabular portion of the data is generated through a Bayesian network, where both the causal structure between the variables and the conditional probabilities are proposed by an expert based on domain knowledge. We then prompt a large language model (GPT-4o) to generate a clinical note related to this patient encounter, describing the patient symptoms and additional context. The SynSUM dataset is primarily designed to facilitate research on clinical information extraction in the presence of tabular background variables, which can be linked through domain knowledge to concepts of interest to be extracted from the text - the symptoms, in the case of SynSUM. Secondary uses include research on the automation of clinical reasoning over both tabular data and text, causal effect estimation in the presence of tabular and/or textual confounders, and multi-modal synthetic data generation. The dataset can be downloaded from https://github.com/prabaey/SynSUM.


AI tongue scanner can diagnose illnesses with 96 percent accuracy

Popular Science

A new artificial intelligence machine learning model is capable of accurately diagnosing certain illnesses nearly every time by simply looking at a patient's tongue. The novel technology, while state-of-the-art, draws its inspiration from medical approaches utilized by humans for over 2,000 years. When it comes to diagnosing ailments, traditional Chinese medicine and other practices often turn to the tongue for clues. Based on its color, shape, and thickness, the muscle can reveal a number of possible health issues--from cancer, to diabetes, to even asthma and gastrointestinal issues. Now, after more than two millennia of peering into patient mouths for answers, doctors may soon receive a second opinion from artificial eyes powered by machine learning.


UnPaSt: unsupervised patient stratification by differentially expressed biclusters in omics data

Hartung, Michael, Maier, Andreas, Delgado-Chaves, Fernando, Burankova, Yuliya, Isaeva, Olga I., Patroni, Fábio Malta de Sá, He, Daniel, Shannon, Casey, Kaufmann, Katharina, Lohmann, Jens, Savchik, Alexey, Hartebrodt, Anne, Chervontseva, Zoe, Firoozbakht, Farzaneh, Probul, Niklas, Zotova, Evgenia, Tsoy, Olga, Blumenthal, David B., Ester, Martin, Laske, Tanja, Baumbach, Jan, Zolotareva, Olga

arXiv.org Artificial Intelligence

Most complex diseases, including cancer and non-malignant diseases like asthma, have distinct molecular subtypes that require distinct clinical approaches. However, existing computational patient stratification methods have been benchmarked almost exclusively on cancer omics data and only perform well when mutually exclusive subtypes can be characterized by many biomarkers. Here, we contribute with a massive evaluation attempt, quantitatively exploring the power of 22 unsupervised patient stratification methods using both, simulated and real transcriptome data. From this experience, we developed UnPaSt (https://apps.cosy.bio/unpast/) optimizing unsupervised patient stratification, working even with only a limited number of subtype-predictive biomarkers. We evaluated all 23 methods on real-world breast cancer and asthma transcriptomics data. Although many methods reliably detected major breast cancer subtypes, only few identified Th2-high asthma, and UnPaSt significantly outperformed its closest competitors in both test datasets. Essentially, we showed that UnPaSt can detect many biologically insightful and reproducible patterns in omic datasets.