cell population
MoRE-GNN: Multi-omics Data Integration with a Heterogeneous Graph Autoencoder
Wang, Zhiyu, Koszut, Sonia, Liò, Pietro, Ceccarelli, Francesco
The integration of multi-omics single-cell data remains challenging due to high-dimensionality and complex inter-modality relationships. To address this, we introduce MoRE-GNN (Multi-omics Relational Edge Graph Neural Network), a heterogeneous graph autoencoder that combines graph convolution and attention mechanisms to dynamically construct relational graphs directly from data. Evaluations on six publicly available datasets demonstrate that MoRE-GNN captures biologically meaningful relationships and outperforms existing methods, particularly in settings with strong inter-modality correlations. Furthermore, the learned representations allow for accurate downstream cross-modal predictions. While performance may vary with dataset complexity, MoRE-GNN offers an adaptive, scalable and interpretable framework for advancing multi-omics integration.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Asia > Middle East > Israel (0.04)
Enhanced Single-Cell RNA-seq Embedding through Gene Expression and Data-Driven Gene-Gene Interaction Integration
Goudarzi, Hojjat Torabi, Pouyan, Maziyar Baran
Single-cell RNA sequencing (scRNA-seq) provides unprecedented insights into cellular heterogeneity, enabling detailed analysis of complex biological systems at single-cell resolution. However, the high dimensionality and technical noise inherent in scRNA-seq data pose significant analytical challenges. While current embedding methods focus primarily on gene expression levels, they often overlook crucial gene-gene interactions that govern cellular identity and function. To address this limitation, we present a novel embedding approach that integrates both gene expression profiles and data-driven gene-gene interactions. Our method first constructs a Cell-Leaf Graph (CLG) using random forest models to capture regulatory relationships between genes, while simultaneously building a K-Nearest Neighbor Graph (KNNG) to represent expression similarities between cells. These graphs are then combined into an Enriched Cell-Leaf Graph (ECLG), which serves as input for a graph neural network to compute cell embeddings. By incorporating both expression levels and gene-gene interactions, our approach provides a more comprehensive representation of cellular states. Extensive evaluation across multiple datasets demonstrates that our method enhances the detection of rare cell populations and improves downstream analyses such as visualization, clustering, and trajectory inference. This integrated approach represents a significant advance in single-cell data analysis, offering a more complete framework for understanding cellular diversity and dynamics.
- North America > United States > Oregon (0.04)
- North America > United States > Ohio > Lucas County > Oregon (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data
Hyun, Sangwon, Coleman, Tim, Ribalet, Francois, Bien, Jacob
Ocean microbes are critical to both ocean ecosystems and the global climate. Flow cytometry, which measures cell optical properties in fluid samples, is routinely used in oceanographic research. Despite decades of accumulated data, identifying key microbial populations (a process known as ``gating'') remains a significant analytical challenge. To address this, we focus on gating multidimensional, high-frequency flow cytometry data collected {\it continuously} on board oceanographic research vessels, capturing time- and space-wise variations in the dynamic ocean. Our paper proposes a novel mixture-of-experts model in which both the gating function and the experts are given by trend filtering. The model leverages two key assumptions: (1) Each snapshot of flow cytometry data is a mixture of multivariate Gaussians and (2) the parameters of these Gaussians vary smoothly over time. Our method uses regularization and a constraint to ensure smoothness and that cluster means match biologically distinct microbe types. We demonstrate, using flow cytometry data from the North Pacific Ocean, that our proposed model accurately matches human-annotated gating and corrects significant errors.
- Pacific Ocean > North Pacific Ocean (0.24)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
Large Language Models Meet Graph Neural Networks for Text-Numeric Graph Reasoning
Song, Haoran, Feng, Jiarui, Li, Guangfu, Province, Michael, Payne, Philip, Chen, Yixin, Li, Fuhai
In real-world scientific discovery, human beings always make use of the accumulated prior knowledge with imagination pick select one or a few most promising hypotheses from large and noisy data analysis results. In this study, we introduce a new type of graph structure, the text-numeric graph (TNG), which is defined as graph entities and associations have both text-attributed information and numeric information. The TNG is an ideal data structure model for novel scientific discovery via graph reasoning because it integrates human-understandable textual annotations or prior knowledge, with numeric values that represent the observed or activation levels of graph entities or associations in different samples. Together both the textual information and numeric values determine the importance of graph entities and associations in graph reasoning for novel scientific knowledge discovery. We further propose integrating large language models (LLMs) and graph neural networks (GNNs) to analyze the TNGs for graph understanding and reasoning. To demonstrate the utility, we generated the text-omic(numeric) signaling graphs (TOSG), as one type of TNGs, in which all graphs have the same entities, associations and annotations, but have sample-specific entity numeric (omic) values using single cell RNAseq (scRNAseq) datasets of different diseases. We proposed joint LLM-GNN models for key entity mining and signaling pathway mining on the TOSGs. The evaluation results showed the LLM-GNN and TNGs models significantly improve classification accuracy and network inference. In conclusion, the TNGs and joint LLM-GNN models are important approaches for scientific discovery.
- North America > United States > Missouri > St. Louis County > St. Louis (0.04)
- North America > United States > Connecticut (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.68)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.46)
Reinforcement Learning for Optimal Control of Adaptive Cell Populations
Kratz, Josiah C., Adamczyk, Jacob
Many organisms and cell types, from bacteria to cancer cells, exhibit a remarkable ability to adapt to fluctuating environments. Additionally, cells can leverage memory of past environments to better survive previously-encountered stressors. From a control perspective, this adaptability poses significant challenges in driving cell populations toward extinction, and is thus an open question with great clinical significance. In this work, we focus on drug dosing in cell populations exhibiting phenotypic plasticity. For specific dynamical models switching between resistant and susceptible states, exact solutions are known. However, when the underlying system parameters are unknown, and for complex memory-based systems, obtaining the optimal solution is currently intractable. To address this challenge, we apply reinforcement learning (RL) to identify informed dosing strategies to control cell populations evolving under novel non-Markovian dynamics. We find that model-free deep RL is able to recover exact solutions and control cell populations even in the presence of long-range temporal dynamics.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Automated Immunophenotyping Assessment for Diagnosing Childhood Acute Leukemia using Set-Transformers
Lygizou, Elpiniki Maria, Reiter, Michael, Maurer-Granofszky, Margarita, Dworzak, Michael, Grosu, Radu
Acute Leukemia is the most common hematologic malignancy in children and adolescents. A key methodology in the diagnostic evaluation of this malignancy is immunophenotyping based on Multiparameter Flow Cytometry (FCM). However, this approach is manual, and thus time-consuming and subjective. To alleviate this situation, we propose in this paper the FCM-Former, a machine learning, self-attention based FCM-diagnostic tool, automating the immunophenotyping assessment in Childhood Acute Leukemia. The FCM-Former is trained in a supervised manner, by directly using flow cytometric data. Our FCM-Former achieves an accuracy of 96.5% assigning lineage to each sample among 960 cases of either acute B-cell, T-cell lymphoblastic, and acute myeloid leukemia (B-ALL, T-ALL, AML). To the best of our knowledge, the FCM-Former is the first work that automates the immunophenotyping assessment with FCM data in diagnosing pediatric Acute Leukemia.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- Europe > Switzerland (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Oncology > Leukemia (1.00)
- Health & Medicine > Therapeutic Area > Hematology (1.00)
FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry Benchmarking
Bini, Lorenzo, Mojarrad, Fatemeh Nassajian, Liarou, Margarita, Matthes, Thomas, Marchand-Maillet, Stéphane
This paper presents FlowCyt, the first comprehensive benchmark for multi-class single-cell classification in flow cytometry data. The dataset comprises bone marrow samples from 30 patients, with each cell characterized by twelve markers. Ground truth labels identify five hematological cell types: T lymphocytes, B lymphocytes, Monocytes, Mast cells, and Hematopoietic Stem/Progenitor Cells (HSPCs). Experiments utilize supervised inductive learning and semi-supervised transductive learning on up to 1 million cells per patient. Baseline methods include Gaussian Mixture Models, XGBoost, Random Forests, Deep Neural Networks, and Graph Neural Networks (GNNs). GNNs demonstrate superior performance by exploiting spatial relationships in graph-encoded data. The benchmark allows standardized evaluation of clinically relevant classification tasks, along with exploratory analyses to gain insights into hematological cell phenotypes. This represents the first public flow cytometry benchmark with a richly annotated, heterogeneous dataset. It will empower the development and rigorous assessment of novel methodologies for single-cell analysis.
- North America > United States (0.14)
- Europe > Switzerland > Geneva > Geneva (0.05)
- Europe > Monaco (0.04)
GateNet: A novel Neural Network Architecture for Automated Flow Cytometry Gating
Fisch, Lukas, Heming, Michael O., Schulte-Mecklenbeck, Andreas, Gross, Catharina C., Zumdick, Stefan, Barkhau, Carlotta, Emden, Daniel, Ernsting, Jan, Leenings, Ramona, Sarink, Kelvin, Winter, Nils R., Dannlowski, Udo, Wiendl, Heinz, Hörste, Gerd Meyer zu, Hahn, Tim
Flow cytometry (FC) is an analytical technique which is used in biological research to identify cell types and in the clinical context to diagnose human diseases including hematological malignancies[1]. FC characterizes cell types by measuring the light scatter and fluorescence emission properties of fluorochrome-labeled antibodies from each of the thousands of cells a sample contains[2]. Based on the measured intensity of the fluorescence and the light scatter of these cell events, cells are distinguished from contaminants, and then each cell is classified into a specific cell population. Traditionally, this classification is done by manually identifying and partitioning (i.e. 'gating') these populations based on visual inspection of mostly two-dimensional intensity histograms of two respective fluorescence emission detectors (Figure 1). Figure 1 Schematic manual gating workflow which corrects for measurement variances across samples caused by the batch effect. The first obstacle during gating is the batch effect, i.e. technical variance of event measurements across samples, caused e.g. by the variability of the staining procedure or by the decay of the exciting laser and the fluorescence emissions of fluorophore-bound antibodies.
- North America > United States (0.14)
- Europe > Germany > North Rhine-Westphalia > Münster Region > Münster (0.04)
- Europe > Switzerland (0.04)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Hematology (0.86)
Uncertainty Wrapper in the medical domain: Establishing transparent uncertainty quantification for opaque machine learning models in practice
Jöckel, Lisa, Kläs, Michael, Popp, Georg, Hilger, Nadja, Fricke, Stephan
When systems use data-based models that are based on machine learning (ML), errors in their results cannot be ruled out. This is particularly critical if it remains unclear to the user how these models arrived at their decisions and if errors can have safety-relevant consequences, as is often the case in the medical field. In such cases, the use of dependable methods to quantify the uncertainty remaining in a result allows the user to make an informed decision about further usage and draw possible conclusions based on a given result. This paper demonstrates the applicability and practical utility of the Uncertainty Wrapper using flow cytometry as an application from the medical field that can benefit from the use of ML models in conjunction with dependable and transparent uncertainty quantification.
- Europe > Germany > Saxony > Leipzig (0.04)
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
- Health & Medicine > Therapeutic Area > Immunology (0.50)
Mathematical Modeling of BCG-based Bladder Cancer Treatment Using Socio-Demographics
Savchenko, Elizaveta, Rosenfeld, Ariel, Bunimovich-Mendrazitsky, Svetlana
Cancer is one of the most widespread diseases around the world with millions of new patients each year. Bladder cancer is one of the most prevalent types of cancer affecting all individuals alike with no obvious prototypical patient. The current standard treatment for BC follows a routine weekly Bacillus Calmette-Guerin (BCG) immunotherapy-based therapy protocol which is applied to all patients alike. The clinical outcomes associated with BCG treatment vary significantly among patients due to the biological and clinical complexity of the interaction between the immune system, treatments, and cancer cells. In this study, we take advantage of the patient's socio-demographics to offer a personalized mathematical model that describes the clinical dynamics associated with BCG-based treatment. To this end, we adopt a well-established BCG treatment model and integrate a machine learning component to temporally adjust and reconfigure key parameters within the model thus promoting its personalization. Using real clinical data, we show that our personalized model favorably compares with the original one in predicting the number of cancer cells at the end of the treatment, with 14.8% improvement, on average.