Artificial Intelligence for CRISPR Guide RNA Design: Explainable Models and Off-Target Safety

Abbaszadeh, Alireza, Shahlai, Armita

arXiv.org Artificial Intelligence 

The CRISPR-Cas genome editing system has rapidly become an indispensable tool across biotechnology and medicine, enabling targeted DNA modifications with unprecedented ease. A single-guide RNA (sgRNA, or simply gRNA) directs the Cas nuclease (such as Cas9 or Cas12a) to a complementary genomic sequence, where the nuclease induces a double-strand break or nucleotide modification. The efficiency and specificity of this process are largely dictated by the gRNA sequence and its interactions with both the target DNA and the cellular environment. Designing optimal gRNAs is therefore critical for successful editing outcomes. Early gRNA design relied on empirical rules and modest machine learning models, but these approaches often struggled to capture the complex determinants of gRNA activity and off-target effects. In recent years, artificial intelligence (AI) - particularly deep learning - has been leveraged to overcome these limitations, learning predictive features from large-scale CRISPR datasets and outperforming previous rule-based methods in guide efficacy prediction[1, 2]. Deep learning models can ingest not only the gRNA and target DNA sequences but also additional contextual information (e.g.