LA4SR: illuminating the dark proteome with generative AI

Nelson, David R., Jaiswal, Ashish Kumar, Ismail, Noha, Mystikou, Alexandra, Salehi-Ashtiani, Kourosh

arXiv.org Artificial Intelligence 

Laboratory of Algal, Synthetic, and Systems Biology, Division of Science and Math, New York University Abu Dhabi (NYUAD), Abu Dhabi, UAE 2. Department of Biology, New York University, New York, NY, USA 3. Biotechnology Research Center, Technology Innovation Institute (TII), PO Box: 9639, Masdar City, Abu Dhabi, UAE Correspondence should be addressed to D.R.N. (drn2@nyu.edu) The models achieved F1 scores up to 95 and operated 16,580x faster and at 2.9x the recall of BLASTP. They effectively classified the algal "dark proteome", (e.g., uncharacterized proteins comprising ~65% of total proteins), validated on new data including a new, complete Hi-C/Pacbio Chlamydomonas genome. SR models reached high accuracy (F1 > 86) when trained on less than 2% of available data, rapidly achieving strong generalization capacity. High accuracy was achieved when training data had intact or scrambled terminal information, demonstrating robust generalization to incomplete sequences.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found