ProteoKnight: Convolution-based phage virion protein classification and uncertainty analysis

Neha, Samiha Afaf, Bhuiyan, Abir Ahammed, Khan, Md. Ishrak

arXiv.org Artificial Intelligence 

Proteins are essential macromolecules responsible for the structural and functional mechanisms of living things. Phage structural proteins are a subclass of the protein family corresponding to the bacteriophages, which are the most prevalent kind of biological creatures [1]. The PVPs are mainly associated with building structural constituents of the bacteriophage [2], such as the baseplate and head as shown in Figure 1, enabling efficient host-phage binding for genome transfer. The amino acid sequences, particularly those involved in structure synthesis, exhibit significant diversity among phages and their respective groups [3]. Consequently, characterizing these sequences becomes a challenging yet crucial task, as the shortage of annotations for phage proteins has emerged as a hindrance in numerous research studies focused on phage genomics [4]. With the advent of high-throughput sequencing technology, rapid additions of sequences are being observed in standard biological databases [6] as shown in Figure 1, giving rise to the need for proper annotation of the sequences. Enhancement of annotations for the phage family is vital for exploring effective anti-bacterial drug synthesis [7, 8], disease diagnosis [9], food production [10], bacterial genome remodeling [11], etc. Traditionally, mass spectrometry and protein array techniques [12, 13] were utilized for characterizing these proteins. However, these methods are associated with significant time, labor, and computational expenses [1]. Similarly, alignment-based methods do not always work well with PVP categorization due to the lack of collinearity [14] in viral genomes, resulting from factors such as horizontal gene transfer, high mutation rates, etc., leading to dissimilar sequences 2 Figure 1 Structural constituents of Phage

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found