A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following

Fang, Yin, Deng, Xinle, Liu, Kangwei, Zhang, Ningyu, Qian, Jingyang, Yang, Penghui, Fan, Xiaohui, Chen, Huajun

Jan-14-2025–arXiv.org Artificial Intelligence

Large language models excel at interpreting complex natural language instructions, enabling them to perform a wide range of tasks. In the life sciences, single-cell RNA sequencing (scRNA-seq) data serves as the "language of cellular biology", capturing intricate gene expression patterns at the single-cell level. However, interacting with this "language" through conventional tools is often inefficient and unintuitive, posing challenges for researchers. To address these limitations, we present InstructCell, a multi-modal AI copilot that leverages natural language as a medium for more direct and flexible single-cell analysis. We construct a comprehensive multi-modal instruction dataset that pairs text-based instructions with scRNA-seq profiles from diverse tissues and species. Building on this, we develop a multi-modal cell language architecture capable of simultaneously interpreting and processing both modalities. InstructCell empowers researchers to accomplish critical tasks--such as cell type annotation, conditional pseudo-cell generation, and drug sensitivity prediction--using straightforward natural language commands. Extensive evaluations demonstrate that InstructCell consistently meets or exceeds the performance of existing single-cell foundation models, while adapting to diverse experimental conditions. More importantly, InstructCell provides an accessible and intuitive tool for exploring complex single-cell data, lowering technical barriers and enabling deeper biological insights.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jan-14-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - China > Zhejiang Province (0.14)
  - Middle East > UAE (0.14)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Performance Analysis > Accuracy (0.68)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found