AITopics

2605.03281

Genre: Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.88)
Health & Medicine > Therapeutic Area > Gastroenterology (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.91)

arXiv.org Artificial IntelligenceNov-27-2025

MoRE: Batch-Robust Multi-Omics Representations from Frozen Pre-trained Transformers

Chen, Audrey Pei-Hsuan

Representation learning on multi-omics data is challenging due to extreme dimensionality, modality heterogeneity, and cohort-specific batch effects. While pre-trained transformer backbones have shown broad generalization capabilities in biological sequence modeling, their application to multi-omics integration remains underexplored. We present MoRE (Multi-Omics Representation Embedding), a framework that repurposes frozen pre-trained transformers to align heterogeneous assays into a shared latent space. Unlike purely generative approaches, MoRE employs a parameter-efficient fine-tuning (PEFT) strategy, prioritizing cross-sample and cross-modality alignment over simple sequence reconstruction. Specifically, MoRE attaches lightweight, modality-specific adapters and a task-adaptive fusion layer to the frozen backbone. It optimizes a masked modeling objective jointly with supervised contrastive and batch-invariant alignment losses, yielding structure-preserving embeddings that generalize across unseen cell types and platforms. We benchmark MoRE against established baselines, including scGPT, scVI, and Harmony with Scrublet, evaluating integration fidelity, rare population detection, and modality transfer. Our results demonstrate that MoRE achieves competitive batch robustness and biological conservation while significantly reducing trainable parameters compared to fully fine-tuned models. This work positions MoRE as a practical step toward general-purpose omics foundation models.

large language model, machine learning, natural language, (22 more...)

2511.20382

Genre: Research Report > New Finding (0.86)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Claye, Charlotte, Marschall, Pierre, Ouerdane, Wassila, Hudelot, Céline, Duquesne, Julien

Discovering Interpretable Biological Concepts in Single-cell RNA-seq Foundation Models

arXiv.org Artificial IntelligenceOct-31-2025

Single-cell RNA-seq foundation models achieve strong performance on downstream tasks but remain black boxes, limiting their utility for biological discovery. Recent work has shown that sparse dictionary learning can extract concepts from deep learning models, with promising applications in biomedical imaging and protein models. However, interpreting biological concepts remains challenging, as biological sequences are not inherently human-interpretable. We introduce a novel concept-based interpretability framework for single-cell RNA-seq models with a focus on concept interpretation and evaluation. We propose an attribution method with counterfactual perturbations that identifies genes that influence concept activation, moving beyond correlational approaches like differential expression analysis. We then provide two complementary interpretation approaches: an expert-driven analysis facilitated by an interactive interface and an ontology-driven method with attribution-based biological pathway enrichment. Applying our framework to two well-known single-cell RNA-seq models from the literature, we interpret concepts extracted by Top-K Sparse Auto-Encoders trained on two immune cell datasets. With a domain expert in immunology, we show that concepts improve interpretability compared to individual neurons while preserving the richness and informativeness of the latent representations. This work provides a principled framework for interpreting what biological knowledge foundation models have encoded, paving the way for their use for hypothesis generation and discovery.

artificial intelligence, deep learning, machine learning, (16 more...)

2510.25807

Country: Europe > Switzerland (0.28)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

arXiv.org Artificial IntelligenceAug-18-2025

Quantum-Boosted High-Fidelity Deep Learning

Wang, Feng-ao, Chen, Shaobo, Xuan, Yao, Liu, Junwei, Gao, Qi, Zhu, Hongdong, Hou, Junjie, Yuan, Lixin, Cheng, Jinyu, Yi, Chenxin, Wei, Hai, Ma, Yin, Xu, Tao, Wen, Kai, Li, Yixue

A fundamental limitation of probabilistic deep learning is its predominant reliance on Gaussian priors. This simplistic assumption prevents models from accurately capturing the complex, non-Gaussian landscapes of natural data, particularly in demanding domains like complex biological data, severely hindering the fidelity of the model for scientific discovery. The physically-grounded Boltzmann distribution offers a more expressive alternative, but it is computationally intractable on classical computers. To date, quantum approaches have been hampered by the insufficient qubit scale and operational stability required for the iterative demands of deep learning. Here, we bridge this gap by introducing the Quantum Boltzmann Machine-Variational Autoencoder (QBM-VAE), a large-scale and long-time stable hybrid quantum-classical architecture. Our framework leverages a quantum processor for efficient sampling from the Boltzmann distribution, enabling its use as a powerful prior within a deep generative model. Applied to million-scale single-cell datasets from multiple sources, the QBM-VAE generates a latent space that better preserves complex biological structures, consistently outperforming conventional Gaussian-based deep learning models like VAE and SCVI in essential tasks such as omics data integration, cell-type classification, and trajectory inference. It also provides a typical example of introducing a physics priori into deep learning to drive the model to acquire scientific discovery capabilities that breaks through data limitations. This work provides the demonstration of a practical quantum advantage in deep learning on a large-scale scientific problem and offers a transferable blueprint for developing hybrid quantum AI models.

artificial intelligence, deep learning, machine learning, (19 more...)

2508.1119

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Bendidi, Ihab, Whitfield, Shawn, Kenyon-Dean, Kian, Yedder, Hanene Ben, Mesbahi, Yassir El, Noutahi, Emmanuel, Denton, Alisandra K.

Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them all

arXiv.org Machine LearningNov-4-2024

Understanding the relationships among genes, compounds, and their interactions in living organisms remains limited due to technological constraints and the complexity of biological data. Deep learning has shown promise in exploring these relationships using various data types. However, transcriptomics, which provides detailed insights into cellular states, is still underused due to its high noise levels and limited data availability. Recent advancements in transcriptomics sequencing provide new opportunities to uncover valuable insights, especially with the rise of many new foundation models for transcriptomics, yet no benchmark has been made to robustly evaluate the effectiveness of these rising models for perturbation analysis. This article presents a novel biologically motivated evaluation framework and a hierarchy of perturbation analysis tasks for comparing the performance of pretrained foundation models to each other and to more classical techniques of learning from transcriptomics data. We compile diverse public datasets from different sequencing techniques and cell lines to assess models performance. Our approach identifies scVI and PCA to be far better suited models for understanding biological perturbations in comparison to existing foundation models, especially in their application in real-world scenarios.

center scaling 0, dataset, perturbation, (13 more...)

2410.13956

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Republic of Türkiye > Corum Province > Corum (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.45)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Zhao, Sarah, Ravuri, Aditya, Lalchand, Vidhi, Lawrence, Neil D.

Scalable Amortized GPLVMs for Single Cell Transcriptomics Data

arXiv.org Machine LearningMay-6-2024

Dimensionality reduction is crucial for analyzing large-scale single-cell RNA-seq data. Gaussian Process Latent Variable Models (GPLVMs) offer an interpretable dimensionality reduction method, but current scalable models lack effectiveness in clustering cell types. We introduce an improved model, the amortized stochastic variational Bayesian GPLVM (BGPLVM), tailored for single-cell RNA-seq with specialized encoder, kernel, and likelihood designs. This model matches the performance of the leading single-cell variational inference (scVI) approach on synthetic and real-world COVID datasets and effectively incorporates cell-cycle and batch information to reveal more interpretable latent structures as we demonstrate on an innate immunity dataset.

dataset, genomic exploration workshop, latent space, (14 more...)

2405.03879

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Netherlands > South Holland > Leiden (0.05)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.71)
Health & Medicine > Therapeutic Area (0.49)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Stirn, Andrew, Knowles, David A.

The VampPrior Mixture Model

arXiv.org Artificial IntelligenceFeb-6-2024

These methods analysis by performing integration and clustering are notorious for finding structure where no structure exists simultaneously. We adapt the VampPrior (Tomczak (Chari & Pachter, 2023). When the embedding function & Welling, 2018) into a Dirichlet process does not account for systematic shifts in expression profiling Gaussian mixture model, resulting in the Vamp-between datasets and/or batches that use different scRNAseq Prior Mixture Model (VMM), a novel prior for technologies, misleading structure can arise, confounding DLVMs. We propose an inference procedure that standard analysis pipelines. Accordingly, Lähnemann alternates between variational inference and Empirical et al. (2020) identify atlas-level integration as one of the Bayes to cleanly distinguish variational grand challenges of single-cell data science.

dataset, scvi, vmm, (16 more...)

2402.04412

Country:

North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(2 more...)

Genre: Research Report (0.83)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Data Science (0.66)

arXiv.org Machine LearningMay-6-2019

A joint model of unpaired data from scRNA-seq and spatial transcriptomics for imputing missing gene expression measurements

Lopez, Romain, Nazaret, Achille, Langevin, Maxime, Samaran, Jules, Regier, Jeffrey, Jordan, Michael I., Yosef, Nir

Spatial studies of transcriptome provide biologists with gene expression maps of heterogeneous and complex tissues. However, most experimental protocols for spatial transcriptomics suffer from the need to select beforehand a small fraction of genes to be quantified over the entire transcriptome. Standard single-cell RNA sequencing (scRNA-seq) is more prevalent, easier to implement and can in principle capture any gene but cannot recover the spatial location of the cells. In this manuscript, we focus on the problem of imputation of missing genes in spatial transcriptomic data based on (unpaired) standard scRNA-seq data from the same biological tissue. Building upon domain adaptation work, we propose gimVI, a deep generative model for the integration of spatial transcriptomic data and scRNA-seq data that can be used to impute missing genes. After describing our generative model and an inference procedure for it, we compare gimVI to alternative methods from computational biology or domain adaptation on real datasets and outperform Seurat Anchors, Liger and CORAL to impute held-out genes.

artificial intelligence, deep learning, machine learning, (19 more...)

1905.02269

Country:

Asia > Middle East > Jordan (0.05)
North America > United States (0.04)
Europe > Portugal > Castelo Branco > Castelo Branco (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Machine LearningJan-16-2018

A deep generative model for gene expression profiles from single-cell RNA sequencing

Lopez, Romain, Regier, Jeffrey, Cole, Michael, Jordan, Michael, Yosef, Nir

We propose a probabilistic model for interpreting gene expression levels that are observed through single-cell RNA sequencing. In the model, each cell has a low-dimensional latent representation. Additional latent variables account for technical effects that may erroneously set some observations of gene expression levels to zero. Conditional distributions are specified by neural networks, giving the proposed model enough flexibility to fit the data well. We use variational inference and stochastic optimization to approximate the posterior distribution. The inference procedure scales to over one million cells, whereas competing algorithms do not. Even for smaller datasets, for several tasks, the proposed procedure outperforms state-of-the-art methods like ZIFA and ZINB-WaVE. We also extend our framework to account for batch effects and other confounding factors, and propose a Bayesian hypothesis test for differential expression that outperforms DESeq2.

artificial intelligence, deep learning, machine learning, (18 more...)

1709.02082

Country: North America > United States > California (0.15)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.41)

Wang, Pengyu, Blunsom, Phil

Stochastic Collapsed Variational Inference for Sequential Data

arXiv.org Machine LearningDec-5-2015

Stochastic variational inference for collapsed models has recently been successfully applied to large scale topic modelling. In this paper, we propose a stochastic collapsed variational inference algorithm in the sequential data setting. Our algorithm is applicable to both finite hidden Markov models and hierarchical Dirichlet process hidden Markov models, and to any datasets generated by emission distributions in the exponential family. Our experiment results on two discrete datasets show that our inference is both more efficient and more accurate than its uncollapsed version, stochastic variational inference.

artificial intelligence, inference, machine learning, (16 more...)

1512.01666

Country: North America > United States > Virginia (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)