Two Recent Developments in Machine Learning for Protein Engineering
Both articles in this post came out of the George Church's lab at Harvard University. The first of them is Unified rational protein engineering with sequence-based deep representation learning. Here, the authors present a recurrent neural network (specifically, a type of mLSTM) which was trained on 24 million UniRef50 protein sequences with the objective of transforming each sequence into a numerical vector of fixed-length (that is, a deep representation). What these vectors or deep representations enable is the ability to analyze and compare protein sequences with techniques borrowed from linear algebra, as opposed to using traditional bioinformatics algorithms like sequence alignment. Next, the authors show that UniRep vectors can be used as input to train a simpler or "top" model (e.g. a linear regression) to predict the effect of single mutations.
Oct-15-2020, 15:50:30 GMT