AITopics | cell type

Collaborating Authors

cell type

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MEDAL: Manifold Embedding Distillation via Autoencoder Learning

Chang, Irene, Zikry, Tarek M., Allen, Genevera I.

arXiv.org Machine LearningMay-26-2026

Low-dimensional embeddings are widely used as visual summaries of high-dimensional data and to enable downstream scientific discoveries. Yet, popular nonlinear dimension reduction methods, such as t-SNE and UMAP, are often selected based on visual appeal alone and without rigorous quantitative validation. A major reason is that manifold embeddings typically do not provide an out-of-sample map nor an inverse back to the original feature space; this makes held-out validation, the gold standard in supervised learning, all but impossible. To address these challenges, we develop a novel framework, MEDAL (Manifold Embedding Distillation via Autoencoder Learning), which distills a fitted manifold embedding into a reusable encoder--decoder model. MEDAL trains a constrained autoencoder whose bottleneck exactly matches any teacher embedding while the decoder reconstructs the original input; this yields an explicit map for new samples, an approximate inverse, and a pointwise reconstruction-based measure of distortion in the manifold space. This converts static manifold embeddings into models that can be evaluated on held-out data, enabling quantitative validation including comparing different dimension reduction methods as well as hyperparameter tuning. Across multiple benchmark and scientific case studies, we show that MEDAL enables held-out validation to determine optimal manifold embeddings and hyperparameters, reveals biologically coherent regions that are difficult to preserve in two dimensional embeddings, and detects distribution shift when new samples are mapped into a fixed reference manifold. MEDAL provides a general validation wrapper to any existing dimension reduction technique that will improve the rigor and

artificial intelligence, machine learning, reconstruction error, (18 more...)

arXiv.org Machine Learning

2605.24244

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Donor-Aware scRNA-seq Benchmarks for IBD Classification

Muhire, Jonathan

arXiv.org Machine LearningMay-6-2026

Donor-level disease classification from single-cell RNA sequencing (scRNA-seq) requires strict donor-aware cross-validation: naive pipelines that split cells randomly conflate training and test donors, inflating reported performance through pseudoreplication. We present a donor-aware benchmark evaluating three feature representations across two independent IBD cohorts: centered log-ratio (CLR) transformed cell-type composition, GatedStructuralCFN dependency embeddings, and scVI variational autoencoder latent embeddings. The cohorts are the SCP259 ulcerative colitis atlas (UC vs. Healthy, n=30 donors, 51 cell types) and the Kong 2023 Crohn's disease atlas (CD vs. Healthy, n=71 donors, 55-68 cell types across three intestinal regions). Compartment-stratified CLR composition achieves AUROC 0.956 +/- 0.061 on SCP259; GatedStructuralCFN on the same features achieves 0.978 +/- 0.050. In the Kong cohort, CFN achieves its best performance in the colon region (0.960 +/- 0.055 after feature filtering), exceeding linear CLR (0.900 +/- 0.100), while terminal ileum classification is dominated by linear models (CatBoost CLR 0.967 +/- 0.075 vs. CFN 0.811 +/- 0.164). Cross-dataset transfer (CD->UC, four shared cell types) achieves AUC 0.833 with XGBoost CLR; the reverse direction performs at chance. CFN edge stability analysis shows that compartment-wise composition eliminates spurious unit-sum-induced instability present in global composition (Jaccard 0.026 vs. top-20 recurrence 1.0). CFN shows a consistent numerical advantage over linear models in the colon region of CD (AUROC 0.960 vs. 0.900), though no inter-method comparison reached statistical significance at n<=34 donors per region. Compartment-aware feature construction is critical for both classification performance and structural interpretability. Code: https://github.com/Jonathan-321/sfn-scrna-study

artificial intelligence, composition, machine learning, (17 more...)

arXiv.org Machine Learning

2605.03281

Genre: Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.88)
Health & Medicine > Therapeutic Area > Gastroenterology (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.91)

Add feedback

Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts

Neural Information Processing SystemsApr-25-2026, 22:19:22 GMT

Here we describe the additional details of FlaMBé's curation including structured guidelines for each annotation task, corpus curation, and file assembly. All manual curation in FlaMBé was conducted by three annotators who have doctorate level expertise in computational biology. For named entity tagging annotations a set of structured guidelines were followed to ensure consistency. The guidelines given to reviewers are in the annotator guidelines section below. B.1 Tissue and cell type entities Generally, all terms, related synonyms, and text entities that can be mapped to an entry from the tissue, organ, body part, fluid, and cell type branches of the NCI thesaurus were labeled. Instead of a rigid vocabulary fixed on exact matches of NCIThesaurus (NCIT) terms and synonyms, annotators were encouraged to tag any word with the same meaning as an ontology term. For example, "Pancreatic ductal adenocarcinoma" describes cancer of the pancreas, which can be related back to the NCI Thesaurus, and thus was tagged as a "TISSUE". An initial set of rules was provided to each annotator. When one annotator encountered a corner case (e.g., "is neuron a tissue or cell type?") all annotators discussed, reached a consensus, then added the corner case to the set of annotation rules.

data mining, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Industry:

Health & Medicine > Therapeutic Area > Oncology > Pancreatic Cancer (0.54)
Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.54)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.34)

Add feedback

23e3d86c9a19d0caf2ec997e73dfcfbd-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-25-2026, 22:19:19 GMT

data mining, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.46)

Genre:

Workflow (0.50)
Research Report (0.47)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.96)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
(2 more...)

Add feedback

1e5eeb40a3fce716b244599862fd2200-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 00:47:22 GMT

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.15)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.96)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Large-Scale Comparative Analysis of Imputation Methods for Single-Cell RNA Sequencing Data

Iwashita, Yuichiro, Abbasi, Ahtisham Fazeel, Kise, Koichi, Dengel, Andreas, Asim, Muhammad Nabeel

arXiv.org Machine LearningApr-15-2026

Background: Single-cell RNA sequencing (scRNA-seq) enables gene expression profiling at cellular resolution but is inherently affected by sparsity caused by dropout events, where expressed genes are recorded as zeros due to technical limitations. These artifacts distort gene expression distributions and compromise downstream analyses. Numerous imputation methods have been proposed to recover latent transcriptional signals. These methods range from traditional statistical models to deep learning (DL)-based methods. However, their comparative performance remains unclear, as existing benchmarks evaluate only a limited subset of methods, datasets, and downstream analyses. Results: We present a comprehensive benchmark of 15 scRNA-seq imputation methods spanning 7 methodological categories, including traditional and DL-based methods. Methods are evaluated across 30 datasets from 10 experimental protocols on 6 downstream analyses. Results show that traditional methods, such as model-based, smoothing-based, and low-rank matrix-based methods, generally outperform DL-based methods, including diffusion-based, GAN-based, GNN-based, and autoencoder-based methods. In addition, strong performance in numerical gene expression recovery does not necessarily translate into improved biological interpretability in downstream analyses, including cell clustering, differential expression analysis, marker gene analysis, trajectory analysis, and cell type annotation. Furthermore, method performance varies substantially across datasets, protocols, and downstream analyses, with no single method consistently outperforming others. Conclusions: Our findings provide practical guidance for selecting imputation methods tailored to specific analytical objectives and underscore the importance of task-specific evaluation when assessing imputation performance in scRNA-seq data analysis.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2603.24626

Country:

Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)
(4 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.67)
Health & Medicine > Therapeutic Area > Immunology (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Designing Cell-Type-Specific Promoter Sequences Using Conservative Model-Based Optimization

Neural Information Processing SystemsMar-21-2026, 23:55:49 GMT

Gene therapies have the potential to treat disease by delivering therapeutic genetic cargo to disease-associated cells. One limitation to their widespread use is the lack of short regulatory sequences, or promoters, that differentially induce the expression of delivered genetic cargo in target cells, minimizing side effects in other cell types. Such cell-type-specific promoters are difficult to discover using existing methods, requiring either manual curation or access to large datasets of promoter-driven expression from both targeted and untargeted cells. Model-based optimization (MBO) has emerged as an effective method to design biological sequences in an automated manner, and has recently been used in promoter design methods. However, these methods have only been tested using large training datasets that are expensive to collect, and focus on designing promoters for markedly different cell types, overlooking the complexities associated with designing promoters for closely related cell types that share similar regulatory features. Therefore, we introduce a comprehensive framework for utilizing MBO to design promoters in a data-efficient manner, with an emphasis on discovering promoters for similar cell types. We use conservative objective models (COMs) for MBO and highlight practical considerations such as best practices for improving sequence diversity, getting estimates of model uncertainty, and choosing the optimal set of sequences for experimental validation. Using three leukemia cell lines (Jurkat, K562, and THP1), we show that our approach discovers many novel cell-type-specific promoters after experimentally validating the designed sequences. For K562 cells, in particular, we discover a promoter that has 75.85\% higher cell-type-specificity than the best promoter from the initial dataset used to train our models.

artificial intelligence, machine learning, promoter, (9 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.38)

Add feedback

Disentangling the Roles of Distinct Cell Classes with Cell-Type Dynamical Systems

Neural Information Processing SystemsMar-19-2026, 15:57:30 GMT

Latent dynamical systems have been widely used to characterize the dynamics of neural population activity in the brain. However, these models typically ignore the fact that the brain contains multiple cell types. This limits their ability to capture the functional roles of distinct cell classes, and to predict the effects of cell-specific perturbations on neural activity or behavior. To overcome these limitations, we introduce the `cell-type dynamical systems (CTDS) model. This model extends latent linear dynamical systems to contain distinct latent variables for each cell class, with biologically inspired constraints on both dynamics and emissions.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.38)

Add feedback

Reproducibility of predictive networks for mouse visual cortex

Neural Information Processing SystemsMar-18-2026, 03:19:20 GMT

Deep predictive models of neuronal activity have recently enabled several new discoveries about the selectivity and invariance of neurons in the visual cortex.These models learn a shared set of nonlinear basis functions, which are linearly combined via a learned weight vector to represent a neuron's function.Such weight vectors, which can be thought as embeddings of neuronal function, have been proposed to define functional cell types via unsupervised clustering.However, as deep models are usually highly overparameterized, the learning problem is unlikely to have a unique solution, which raises the question if such embeddings can be used in a meaningful way for downstream analysis.In this paper, we investigate how stable neuronal embeddings are with respect to changes in model architecture and initialization. We find that $L_1$ regularization to be an important ingredient for structured embeddings and develop an adaptive regularization that adjusts the strength of regularization per neuron. This regularization improves both predictive performance and how consistently neuronal embeddings cluster across model fits compared to uniform regularization.To overcome overparametrization, we propose an iterative feature pruning strategy which reduces the dimensionality of performance-optimized models by half without loss of performance and improves the consistency of neuronal embeddings with respect to clustering neurons.Our results suggest that to achieve an objective taxonomy of cell types or a compact representation of the functional landscape, we need novel architectures or learning techniques that improve identifiability.

artificial intelligence, machine learning, regularization, (6 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

cell type

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

MEDAL: Manifold Embedding Distillation via Autoencoder Learning

Donor-Aware scRNA-seq Benchmarks for IBD Classification

db68f1c25678f72561ab7c97ce15d912-Supplemental-Conference.pdf

Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts

23e3d86c9a19d0caf2ec997e73dfcfbd-Paper-Datasets_and_Benchmarks.pdf

1e5eeb40a3fce716b244599862fd2200-Paper.pdf

A Large-Scale Comparative Analysis of Imputation Methods for Single-Cell RNA Sequencing Data

Designing Cell-Type-Specific Promoter Sequences Using Conservative Model-Based Optimization

Disentangling the Roles of Distinct Cell Classes with Cell-Type Dynamical Systems

Reproducibility of predictive networks for mouse visual cortex