The divergence time of protein structures modelled by Markov matrices and its relation to the divergence of sequences
Rajapaksa, Sandun, Allison, Lloyd, Stuckey, Peter J., de la Banda, Maria Garcia, Konagurthu, Arun S.
–arXiv.org Artificial Intelligence
The evolutionary distance between two species is proportional to some (unknown) function of the time of divergence from their common ancestor. One way to estimate this time is by comparing the underlying macromolecular sequences that cascade the information of accumulated evolutionary changes across DNA RNA Proteins (sequence structure function). Since the introduction of the molecular evolutionary clock by Zuckerkandl and Pauling (1965) to perform phylogenetic studies, several statistical models have been proposed to estimate the divergence of extant sequences from common ancestors, and to correlate the estimates of time from other sources of information (e.g., fossil records) when they exist (Sarich and Wilson, 1967). Such divergence time estimates require reliable statistical models of DNA/RNA/Proteins macromolecules (Bromham and Penny, 2003). For protein amino acid sequences, several statistical models have been proposed to explain sequence variation as a function of time. The point accepted mutation (PAM) matrix of Dayhoff et al. (1978) was the first successful model to explain the mutability of amino acid sequences. PAM is a stochastic (Markov) matrix defined in PAM (time) units where PAM-1 is a Markov matrix that embodies a 1% expected change to the amino acids. Subsequent studies highlighted the importance of incorporating evolutionary time-dependent substitution and gap models as an elegant way to model the divergent relationships of proteins (Holmes, 1998; Gonnet et al., 1992). The recent approach of Sumanaweera et al. (2022) derives a unified statistical model for quantifying the evolution of pairs of protein sequences
arXiv.org Artificial Intelligence
Aug-10-2023