ReCellTy: Domain-specific knowledge graph retrieval-augmented LLMs workflow for single-cell annotation

Han, Dezheng, Jia, Yibin, Chen, Ruxiao, Han, Wenjie, Guo, Shuaishuai, Wang, Jianbo

arXiv.org Artificial Intelligence 

These authors contributed equally to this work. Abstract To enable precise and fully automated cell type annotation with large language models (LLMs), we developed a graph-structured feature-marker database to retrieve entities linked to differential genes for cell reconstruction. We further designed a multi-task workflow to optimize the annotation process. Compared to general-purpose LLMs, our method improves human evaluation scores by up to 0.21 and semantic similarity by 6.1% across 11 tissue types, while more closely aligning with the cognitive logic of manual annotation. Keywords: Cell type annotation, Graph RAG, Large language models, Graph data curation, Multi-task workflow, scRNA-seq In single-cell RNA sequencing analysis, achieving precise cell type annotation through manual labeling typically requires two key steps: annotators retrieve relevant marker genes and integrate this information with their domain expertise to make informed decisions.