A deep neural network based architecture was constructed to predict amino acid side chain conformation with unprecedented accuracy. Amino acid side chain conformation prediction is essential for protein homology modeling and protein design. Current widely-adopted methods use physics-based energy functions to evaluate side chain conformation. Here, using a deep neural network architecture without physics-based assumptions, we have demonstrated that side chain conformation prediction accuracy can be improved by more than 25%, especially for aromatic residues compared with current standard methods. More strikingly, the prediction method presented here is robust enough to identify individual conformational outliers from high resolution structures in a protein data bank without providing its structural factors. We envisage that our amino acid side chain predictor could be used as a quality check step for future protein structure model validation and many other potential applications such as side chain assignment in Cryo-electron microscopy, crystallography model auto-building, protein folding and small molecule ligand docking.
Problems related to an inadequate data base of interpretation rules. The same set of production rules can suggest possible structural interpretations of 13C spectral features. Any individual 13C feature permits a great variety of st,:uctural interpretations. This paper presents an "expert system" devised to aid organic chemists in determining the structure (i.e. the arrangement of atoms and bonds) of newly isolated, naturally occurring compounds. The system exploits a data base of rules for analyzing.013
One of the most fundamental problems in molecular biology is the prediction of tertiary structure from primary structure: the protein folding problem. The goal of protein folding is the prediction of one feature of a folded protein (the 3D coordinates of its backbone atoms) from another feature (the sequence of amino acid residues that make up the protein). The protein folding problem is of enormous practical importance because the latter feature (the primary structure) is much easier to establish than the former (the tertiary structure). A related problem is the buriedness problem: the prediction of the degree of exposure to the solvent (the buriedness) of each amino acid residue in a folded protein. Some amino acid residues will have a buriedness of 0%: these are in the core of the protein and are likely hydrophobic. Other residues will have a buriedness of 100%: these are on the surface of the protein and are probably hydrophilic. The buriedness problem is interesting because it is a simplified version of the protein folding problem. In this paper I will show that genetic programming (Koza 1992; Koza 1994) does find programs that predict the buriedness of residues. These programs work better than would be expected of randomly generated programs and there is very little externally imposed bias towards any particular sizes, shapes/architectures or compositions.
An overview on the development of QSPR/QSAR equations using various descriptor mining techniques and multilinear regression analysis in the framework of program CODESSA (Comprehensive Descriptors for Structural and Statistical Analysis) is given. The description of the methodologies applied in CODESSA is followed by the presentation of the QSAR and QSPR models derived for eighteen molecular activities and properties. The properties cover single molecular species, interactions between different molecular species, properties of surfactants, complex properties and properties of polymers.