Analyzing scRNA-seq data by CCP-assisted UMAP and t-SNE

Hozumi, Yuta, Wei, Gu-Wei

arXiv.org Artificial Intelligence 

Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that profiles the transcriptome of individual cells within a tissue or organ, aiming to gain understanding of gene expression, gene regulation, cell-cell interaction, spatial transcriptomics, signal transduction pathways, and more [1]. The typical workflow of scRNA-seq involves cell isolation, RNA extraction, library preparation, sequencing, and data analysis. One of the key challenges in scRNA-seq analysis is the enormous amount of data generated, which can be complex, nonuniform, noisy, unlabeled, and of excessively high dimensions. A typical data analysis pipeline involves data preprocessing, gene expression quantification, normalization and batch correction, dimensionality reduction, cell type identification, differential gene expression analysis, and pathway and functional analysis [2-7]. Principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP) are some of the most popular approaches for dimensionality reduction, clustering, and visualization of scRNA-seq data. PCA is a classical technique used for dimensionality reduction and visualization. It identifies the most important patterns and correlations in a high-dimensional dataset and expresses them as a linear combination of new and orthogonal components. Among them, the first few components are often regarded as principal components, which can be used to visualize the scRNA-seq data in a lower-dimensional space or to identify important gene expression patterns.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found