Goto

Collaborating Authors

 macrophage




Large Language Models Meet Graph Neural Networks for Text-Numeric Graph Reasoning

Song, Haoran, Feng, Jiarui, Li, Guangfu, Province, Michael, Payne, Philip, Chen, Yixin, Li, Fuhai

arXiv.org Artificial Intelligence

In real-world scientific discovery, human beings always make use of the accumulated prior knowledge with imagination pick select one or a few most promising hypotheses from large and noisy data analysis results. In this study, we introduce a new type of graph structure, the text-numeric graph (TNG), which is defined as graph entities and associations have both text-attributed information and numeric information. The TNG is an ideal data structure model for novel scientific discovery via graph reasoning because it integrates human-understandable textual annotations or prior knowledge, with numeric values that represent the observed or activation levels of graph entities or associations in different samples. Together both the textual information and numeric values determine the importance of graph entities and associations in graph reasoning for novel scientific knowledge discovery. We further propose integrating large language models (LLMs) and graph neural networks (GNNs) to analyze the TNGs for graph understanding and reasoning. To demonstrate the utility, we generated the text-omic(numeric) signaling graphs (TOSG), as one type of TNGs, in which all graphs have the same entities, associations and annotations, but have sample-specific entity numeric (omic) values using single cell RNAseq (scRNAseq) datasets of different diseases. We proposed joint LLM-GNN models for key entity mining and signaling pathway mining on the TOSGs. The evaluation results showed the LLM-GNN and TNGs models significantly improve classification accuracy and network inference. In conclusion, the TNGs and joint LLM-GNN models are important approaches for scientific discovery.


Spatially-Delineated Domain-Adapted AI Classification: An Application for Oncology Data

Farhadloo, Majid, Sharma, Arun, Leontovich, Alexey, Markovic, Svetomir N., Shekhar, Shashi

arXiv.org Artificial Intelligence

Given multi-type point maps from different place-types (e.g., tumor regions), our objective is to develop a classifier trained on the source place-type to accurately distinguish between two classes of the target place-type based on their point arrangements. This problem is societally important for many applications, such as generating clinical hypotheses for designing new immunotherapies for cancer treatment. The challenge lies in the spatial variability, the inherent heterogeneity and variation observed in spatial properties or arrangements across different locations (i.e., place-types). Previous techniques focus on self-supervised tasks to learn domain-invariant features and mitigate domain differences; however, they often neglect the underlying spatial arrangements among data points, leading to significant discrepancies across different place-types. We explore a novel multi-task self-learning framework that targets spatial arrangements, such as spatial mix-up masking and spatial contrastive predictive coding, for spatially-delineated domain-adapted AI classification. Experimental results on real-world datasets (e.g., oncology data) show that the proposed framework provides higher prediction accuracy than baseline methods.


Immunocto: a massive immune cell database auto-generated for histopathology

Simard, Mikaël, Shen, Zhuoyan, Hawkins, Maria A., Collins-Fekete, Charles-Antoine

arXiv.org Artificial Intelligence

With the advent of novel cancer treatment options such as immunotherapy, studying the tumour immune micro-environment is crucial to inform on prognosis and understand response to therapeutic agents. A key approach to characterising the tumour immune micro-environment may be through combining (1) digitised microscopic high-resolution optical images of hematoxylin and eosin (H&E) stained tissue sections obtained in routine histopathology examinations with (2) automated immune cell detection and classification methods. However, current individual immune cell classification models for digital pathology present relatively poor performance. This is mainly due to the limited size of currently available datasets of individual immune cells, a consequence of the time-consuming and difficult problem of manually annotating immune cells on digitised H&E whole slide images. In that context, we introduce Immunocto, a massive, multi-million automatically generated database of 6,848,454 human cells, including 2,282,818 immune cells distributed across 4 subtypes: CD4$^+$ T cell lymphocytes, CD8$^+$ T cell lymphocytes, B cell lymphocytes, and macrophages. For each cell, we provide a 64$\times$64 pixels H&E image at $\mathbf{40}\times$ magnification, along with a binary mask of the nucleus and a label. To create Immunocto, we combined open-source models and data to automatically generate the majority of contours and labels. The cells are obtained from a matched H&E and immunofluorescence colorectal dataset from the Orion platform, while contours are obtained using the Segment Anything Model. A classifier trained on H&E images from Immunocto produces an average F1 score of 0.74 to differentiate the 4 immune cell subtypes and other cells. Immunocto can be downloaded at: https://zenodo.org/uploads/11073373.


Multi-omics Prediction from High-content Cellular Imaging with Deep Learning

Mehrizi, Rahil, Mehrjou, Arash, Alegro, Maryana, Zhao, Yi, Carbone, Benedetta, Fishwick, Carl, Vappiani, Johanna, Bi, Jing, Sanford, Siobhan, Keles, Hakan, Bantscheff, Marcus, Nguyen, Cuong, Schwab, Patrick

arXiv.org Artificial Intelligence

High-content cellular imaging, transcriptomics, and proteomics data provide rich and complementary views on the molecular layers of biology that influence cellular states and function. However, the biological determinants through which changes in multi-omics measurements influence cellular morphology have not yet been systematically explored, and the degree to which cell imaging could potentially enable the prediction of multi-omics directly from cell imaging data is therefore currently unclear. Here, we address the question of whether it is possible to predict bulk multi-omics measurements directly from cell images using Image2Omics -- a deep learning approach that predicts multi-omics in a cell population directly from high-content images stained with multiplexed fluorescent dyes. We perform an experimental evaluation in gene-edited macrophages derived from human induced pluripotent stem cell (hiPSC) under multiple stimulation conditions and demonstrate that Image2Omics achieves significantly better performance in predicting transcriptomics and proteomics measurements directly from cell images than predictors based on the mean observed training set abundance. We observed significant predictability of abundances for 5903 (22.43%; 95% CI: 8.77%, 38.88%) and 5819 (22.11%; 95% CI: 10.40%, 38.08%) transcripts out of 26137 in M1 and M2-stimulated macrophages respectively and for 1933 (38.77%; 95% CI: 36.94%, 39.85%) and 2055 (41.22%; 95% CI: 39.31%, 42.42%) proteins out of 4986 in M1 and M2-stimulated macrophages respectively. Our results show that some transcript and protein abundances are predictable from cell imaging and that cell imaging may potentially, in some settings and depending on the mechanisms of interest and desired performance threshold, even be a scalable and resource-efficient substitute for multi-omics measurements.


Deep learning improves the accuracy of multispectral image analysis for digital pathology

#artificialintelligence

The development of digital pathology has resulted in radically better patient care and medical research. The cellular microenvironment is complex, but the introduction of tools like Imaging Mass CytometryTM (IMCTM, Fluidigm) and high-plex fluorescence staining panels has boosted researchers' capacity to probe it (e.g. The development of techniques for spotting significant variations in the datasets has become more difficult as a result of increases in the volume and complexity of the data that these technologies produce. Dr. Heather Stevenson is the Director of Transplantation Pathology at the University of Texas Medical Branch and an Associate Professor there. Her main area of study is hepatic immunology, specifically how dysregulation of the immune system contributes to the fibrosis development.


New test uses nanotechnology, artificial intelligence to diagnose TB in children

#artificialintelligence

A new blood test developed by Tulane University researchers combines nanotechnology with artificial intelligence to diagnose tuberculosis (TB) in children in instances when the deadly disease might otherwise go undetected, according to a study in Nature Biomedical Engineering. Although the current test requires a sophisticated lab to perform, researchers are working to streamline it so it can be performed in the community and read with a smartphone. "TB is a disease found primarily in resource-limited areas, the ideal is to create a smartphone-based method that could be used at the point-of-care in these settings," said senior study author Tony Hu, PhD, Weatherhead Presidential Chair in Biotechnology Innovation at Tulane University. TB is the second most common cause of infectious disease death worldwide, having only recently been supplanted by COVID-19. The disease is particularly deadly in young children, especially those with HIV.


Shape Modeling with Spline Partitions

Ge, Shufei, Wang, Shijia, Elliott, Lloyd

arXiv.org Machine Learning

Shape modelling (with methods that output shapes) is a new and important task in Bayesian nonparametrics and bioinformatics. In this work, we focus on Bayesian nonparametric methods for capturing shapes by partitioning a space using curves. In related work, the classical Mondrian process is used to partition spaces recursively with axis-aligned cuts, and is widely applied in multi-dimensional and relational data. The Mondrian process outputs hyper-rectangles. Recently, the random tessellation process was introduced as a generalization of the Mondrian process, partitioning a domain with non-axis aligned cuts in an arbitrary dimensional space, and outputting polytopes. Motivated by these processes, in this work, we propose a novel parallelized Bayesian nonparametric approach to partition a domain with curves, enabling complex data-shapes to be acquired. We apply our method to HIV-1-infected human macrophage image dataset, and also simulated datasets sets to illustrate our approach. We compare to support vector machines, random forests and state-of-the-art computer vision methods such as simple linear iterative clustering super pixel image segmentation. We develop an R package that is available at \url{https://github.com/ShufeiGe/Shape-Modeling-with-Spline-Partitions}.


Self adversarial attack as an augmentation method for immunohistochemical stainings

Vasiljević, Jelica, Feuerhake, Friedrich, Wemmert, Cédric, Lampert, Thomas

arXiv.org Artificial Intelligence

It has been shown that unpaired image-to-image translation methods constrained by cycle-consistency hide the information necessary for accurate input reconstruction as imperceptible noise. We demonstrate that, when applied to histopathology data, this hidden noise appears to be related to stain specific features and show that this is the case with two immunohistochemical stainings during translation to Periodic acid- Schiff (PAS), a histochemical staining method commonly applied in renal pathology. Moreover, by perturbing this hidden information, the translation models produce different, plausible outputs. We demonstrate that this property can be used as an augmentation method which, in a case of supervised glomeruli segmentation, leads to improved performance.