DYNA: Disease-Specific Language Model for Variant Pathogenicity
–arXiv.org Artificial Intelligence
Clinical variant classification of pathogenic versus benign genetic variants remains a challenge in clinical genetics. Recently, the proposition of genomic foundation models has improved the generic variant effect prediction (VEP) accuracy via weakly-supervised or unsupervised training. However, these VEPs are not diseasespecific, limiting their adaptation at the point of care. To address this problem, we propose DYNA: Disease-specificity fine-tuning via a Siamese neural network broadly applicable to all genomic foundation models for more effective variant effect predictions in disease-specific contexts. We evaluate DYNA in two distinct diseaserelevant tasks. For coding VEPs, we focus on various cardiovascular diseases, where gene-disease relationships of loss-of-function vs. gain-of-function dictate disease-specific VEP. For non-coding VEPs, we apply DYNA to an essential posttranscriptional regulatory axis of RNA splicing, the most common non-coding pathogenic mechanism in established clinical VEP guidelines. The DYNA fine-tuned models show superior performance in the held-out rare variant testing set and are further replicated in large, clinically-relevant variant annotations in ClinVAR. Thus, DYNA offers a potent disease-specific variant effect prediction method, excelling in intra-gene generalization and generalization to unseen genetic variants, making it particularly valuable for disease associations and clinical applicability. Clinical variant interpretation is transforming precision medicine, yet limitations exist that prevent its further adaptations and utilities [1]. Following a disease diagnosis, the identification and classification of pathogenic vs benign genetic variant has important clinical implications. The outcome of clinical variant interpretation provides a basis for clinical screening [2, 3] and genetic testing of first-degree family members [4], and may serve as a prognostic marker for the affected patient [5, 6]. Currently, the utility of genetic testing is limited by the fact that a substantial proportion (30-50%) of yielded variants are classified as variant of uncertain significance (VUS) according to the ACMG guidelines [7].
arXiv.org Artificial Intelligence
May-31-2024
- Country:
- North America > United States (0.14)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Technology: