Active learning framework leveraging transcriptomics identifies modulators of disease phenotypes Science

Science 

We introduced a perturbational single-cell RNA sequencing (scRNA-seq) dataset with 1.2 million cells spanning 88 perturbations across 10 primary and cancer cell lines. Using this dataset along with public perturbational omics data (held-out CMap and SciPlex signatures), we showed that DrugReflector robustly prioritizes compounds from transcriptional signatures even outside of its training context, consistently outperforming state-of-the-art approaches. Through two hematopoietic campaigns using single-cell atlas–defined cell state transitions as model inputs, we identified inducers of megakaryocyte and erythroid differentiation, achieving hit rates 10-fold higher than a random baseline. To assess generalizability, we additionally deployed DrugReflector in two distinct oncology indications, recovering clinical standards of care and modulators of known indication-specific pathways. To further characterize and leverage the transcriptional drivers of megakaryocyte induction, we created a time-course scRNA-seq dataset of hematopoietic stem and progenitor cells with paired flow cytometry readouts for a range of transcriptionally and phenotypically active compounds.