Goto

Collaborating Authors

 cell type classification


AttriGen: Automated Multi-Attribute Annotation for Blood Cell Datasets

Houmaidi, Walid, Sabiri, Youssef, Iguenfer, Fatima Zahra, Abouaomar, Amine

arXiv.org Artificial Intelligence

Abstract--We introduce AttriGen, a novel framework for automated, fine-grained multi-attribute annotation in computer vision, with a particular focus on cell microscopy where multi-attribute classification remains underrepresented compared to traditional cell type categorization. Using two complementary datasets: the Peripheral Blood Cell (PBC) dataset containing eight distinct cell types and the WBC Attribute Dataset (WBCAtt) that contains their corresponding 11 morphological attributes, we propose a dual-model architecture that combines a CNN for cell type classification, as well as a Vision Transformer (ViT) for multi-attribute classification achieving a new benchmark of 94.62% accuracy. Our experiments demonstrate that AttriGen significantly enhances model interpretability and offers substantial time and cost efficiency relative to conventional full-scale human annotation. Thus, our framework establishes a new paradigm that can be extended to other computer vision classification tasks by effectively automating the expansion of multi-attribute labels. Early diagnosis hinges on microscopic review of blood smears, a task that is slow, labor-intensive, and increasingly hampered by shortages of laboratory experts [2], [3].


Towards Applying Large Language Models to Complement Single-Cell Foundation Models

Palayew, Steven, Wang, Bo, Bader, Gary

arXiv.org Artificial Intelligence

Single-cell foundation models such as scGPT represent a significant advancement in single-cell omics, with an ability to achieve state-of-the-art performance on various downstream biological tasks. However, these models are inherently limited in that a vast amount of information in biology exists as text, which they are unable to leverage. There have therefore been several recent works that propose the use of LLMs as an alternative to single-cell foundation models, achieving competitive results. However, there is little understanding of what factors drive this performance, along with a strong focus on using LLMs as an alternative, rather than complementary approach to single-cell foundation models. In this study, we therefore investigate what biological insights contribute toward the performance of LLMs when applied to single-cell data, and introduce scMPT; a model which leverages synergies between scGPT, and single-cell representations from LLMs that capture these insights. scMPT demonstrates stronger, more consistent performance than either of its component models, which frequently have large performance gaps between each other across datasets. We also experiment with alternate fusion methods, demonstrating the potential of combining specialized reasoning models with scGPT to improve performance. This study ultimately showcases the potential for LLMs to complement single-cell foundation models and drive improvements in single-cell analysis.


Lower-dimensional projections of cellular expression improves cell type classification from single-cell RNA sequencing

Umar, Muhammad, Asif, Muhammad, Mahmood, Arif

arXiv.org Artificial Intelligence

Single-cell RNA sequencing (scRNA-seq) enables the study of cellular diversity at single cell level. It provides a global view of cell-type specification during the onset of biological mechanisms such as developmental processes and human organogenesis. Various statistical, machine and deep learning-based methods have been proposed for cell-type classification. Most of the methods utilizes unsupervised lower dimensional projections obtained from for a large reference data. In this work, we proposed a reference-based method for cell type classification, called EnProCell. The EnProCell, first, computes lower dimensional projections that capture both the high variance and class separability through an ensemble of principle component analysis and multiple discriminant analysis. In the second phase, EnProCell trains a deep neural network on the lower dimensional representation of data to classify cell types. The proposed method outperformed the existing state-of-the-art methods when tested on four different data sets produced from different single-cell sequencing technologies. The EnProCell showed higher accuracy (98.91) and F1 score (98.64) than other methods for predicting reference from reference datasets. Similarly, EnProCell also showed better performance than existing methods in predicting cell types for data with unknown cell types (query) from reference datasets (accuracy:99.52; F1 score: 99.07). In addition to improved performance, the proposed methodology is simple and does not require more computational resources and time. the EnProCell is available at https://github.com/umar1196/EnProCell.


scBiGNN: Bilevel Graph Representation Learning for Cell Type Classification from Single-cell RNA Sequencing Data

Yang, Rui, Dai, Wenrui, Li, Chenglin, Zou, Junni, Wu, Dapeng, Xiong, Hongkai

arXiv.org Artificial Intelligence

Single-cell RNA sequencing (scRNA-seq) technology provides high-throughput gene expression data to study the cellular heterogeneity and dynamics of complex organisms. Graph neural networks (GNNs) have been widely used for automatic cell type classification, which is a fundamental problem to solve in scRNA-seq analysis. However, existing methods do not sufficiently exploit both gene-gene and cell-cell relationships, and thus the true potential of GNNs is not realized. In this work, we propose a bilevel graph representation learning method, named scBiGNN, to simultaneously mine the relationships at both gene and cell levels for more accurate single-cell classification. Specifically, scBiGNN comprises two GNN modules to identify cell types. A gene-level GNN is established to adaptively learn gene-gene interactions and cell representations via the self-attention mechanism, and a cell-level GNN builds on the cell-cell graph that is constructed from the cell representations generated by the gene-level GNN. To tackle the scalability issue for processing a large number of cells, scBiGNN adopts an Expectation Maximization (EM) framework in which the two modules are alternately trained via the E-step and M-step to learn from each other. Through this interaction, the gene- and cell-level structural information is integrated to gradually enhance the classification performance of both GNN modules. Experiments on benchmark datasets demonstrate that our scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.


Congratulations to the #ECAI2023 outstanding paper award winners

AIHub

The 26th European Conference on Artificial Intelligence (ECAI 2023) took place from 30 September – 4 October in Krakow, Poland. On the final day of the conference, the outstanding paper awards were announced. There were two winners in the ECAI 2023 Outstanding Paper category, and one winner in the Outstanding Paper for AI in Social Good category. Abstract: Learning effective strategies in sparse reward tasks is one of the fundamental challenges in reinforcement learning. This becomes extremely difficult in multi-agent environments, as the concurrent learning of multiple agents induces the non-stationarity problem and sharply increased joint state space.


scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain

Oh, Gyutaek, Choi, Baekgyu, Jung, Inkyung, Ye, Jong Chul

arXiv.org Artificial Intelligence

Single-cell RNA sequencing (scRNA-seq) has made significant strides in unraveling the intricate cellular diversity within complex tissues. This is particularly critical in the brain, presenting a greater diversity of cell types than other tissue types, to gain a deeper understanding of brain function within various cellular contexts. However, analyzing scRNA-seq data remains a challenge due to inherent measurement noise stemming from dropout events and the limited utilization of extensive gene expression information. In this work, we introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNAseq analysis in the brain. Specifically, inspired by the recent Hyena operator, we design a novel Transformer architecture called singe-cell Hyena (scHyena) that is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator. This enables us to process full-length scRNAseq data without losing any information from the raw data. In particular, our model learns generalizable features of cells and genes through pre-training scHyena using the full length of scRNA-seq data. We demonstrate the superior performance of scHyena compared to other benchmark methods in downstream tasks, including cell type classification and scRNA-seq imputation. Single-cell RNA sequencing (scRNA-seq) is a powerful technique for profiling gene expression levels at single-cell resolution, enabling molecular characteristics of complex biological systems in both normal and disease states (Saliba et al., 2014; Rood et al., 2022). Through scRNA-seq, several key objectives can be achieved, including cell type annotation (Li et al., 2020; Hao et al., 2021), the discovery of novel cell types (Villani et al., 2017), the identification of marker genes (Jaitin et al., 2014), and the analysis of cellular heterogeneity (Papalexi & Satija, 2018; Kinker et al., 2020). It is worth noting that the brain exhibits a particularly diverse range of cell types compared to other tissues (Saunders et al., 2018; Hodge et al., 2019). Therefore, conducting scRNA-seq analysis in the brain is especially important to gain a deeper understanding of brain function within various cellular contexts.


Daily Digest

#artificialintelligence

To elucidate the genetics of coronary artery disease (CAD) in the Japanese population, researchers conducted a large-scale genome-wide association study of 168,228 individuals of Japanese ancestry (25,892 cases and 142,336 controls) with genotype imputation using a newly developed reference panel of Japanese haplotypes including 1,781 CAD cases and 2,636 controls. They detected eight new susceptibility loci and Japanese-specific rare variants contributing to disease severity and increased cardiovascular mortality. They then conducted a trans-ancestry meta-analysis and discovered 35 additional new loci. Using the meta-analysis results, they derived a polygenic risk score (PRS) for CAD, which outperformed those derived from either Japanese or European genome-wide association studies. Researchers manually curated a set of 255 splice events detected in a large-scale tissue-based proteomics experiment and found that more than a third had evidence of significant tissue-specific differences.