AlphaFold Distillation for Protein Design

Melnyk, Igor, Lozano, Aurelie, Das, Payel, Chenthamarakshan, Vijil

arXiv.org Artificial Intelligence 

Although our proposed AFDistill system is novel, efficient and showed promising results during evaluations, there are a number of limitations of the current approach: AFDistill dependency on the accuracy of the AlphaFold forward folding model: The quality of the distilled model is directly related to the accuracy of the original forward folding model, including the biases inherited from it. Limited coverage of protein sequence space: Despite the advances in AlphaFold forward folding models, they are still limited in their ability to accurately predict the structure of many protein sequences, including the TM score and pLDDT confidence metrics, that AFDistill relies on. Uncertainty in structural predictions: The confidence metrics (TM score and pLDDT) used in the distillation process are subject to uncertainty, which can lead to errors in the distilled model's predictions and ultimately impact the quality of the generated sequences in downstream applications. The need for a large amount of computational resources: The training process of AFDistill model requires significant computational resources. However, this might be mitigated by the amortization effect where the high upfront training cost in downstream applications pays in terms of cheap and fast inference through the model.