Using software to compare genetic information in bacterial isolates from animals and people, researchers have predicted that less than 10% of Escherichia coli 0157:H7 strains are likely to have the potential to cause human disease. According to Nadejda Lupolova, from the University of Edinburgh, Scotland, and colleagues, "machine-learning approaches have tremendous potential to interrogate complex genome information for which specific attributes of the organism, such as disease or isolation host, are known." The researchers published the results of their study in Proceedings of the National Academy of Sciences. Although most E. coli strains live in the gastrointestinal tracts of people and animals without causing disease, infection with E. coli 0157 is associated with serious illness in people. E. coli 0157 was first identified as a cause of disease in the United States in 1982, during an investigation into an outbreak of hemorrhagic colitis.
A team of researchers has found a new way to detect dangerous strains of bacteria, potentially preventing outbreaks of food poisoning. The team developed a method that utilizes machine learning and tested it with isolates of Escherichia coli strains. The details are in a paper that was just published in the journal Proceedings of the National Academy of Sciences. Most strains of Escherichia coli are harmless and naturally found in the human body. There are pathogenic strains, however, and they are a rising health concern.
This paper deals with the relations among structural, topological, and chemical properties of the E.Coli proteome from the vantage point of the solubility/aggregation propensity of proteins. Each E.Coli protein is initially represented according to its known folded 3D shape. This step consists in representing the available E.Coli proteins in terms of graphs. We first analyze those graphs by considering pure topological characterizations, i.e., by analyzing the mass fractal dimension and the distribution underlying both shortest paths and vertex degrees. Results confirm the general architectural principles of proteins. Successively, we focus on the statistical properties of a representation of such graphs in terms of vectors composed of several numerical features, which we extracted from their structural representation. We found that protein size is the main discriminator for the solubility, while however there are other factors that help explaining the solubility degree. We finally analyze such data through a novel one-class classifier, with the aim of discriminating among very and poorly soluble proteins. Results are encouraging and consolidate the potential of pattern recognition techniques when employed to describe complex biological systems.
Initiation of transcription is the first step in gene expression, and constitutes an important point of control in prokaryotes as well as in eukaryotes (Reznikoff et al. 1985). Transcription initiates when RNApolymerase recognizes and binds to certain DNAsequences termed promoters. Subsequent to binding, a short stretch of the DNA double helix is disrupted, and the polymerase starts to synthesize RNA by the process of complementary basepairing. The sequence of the promoter determines the position of the transcriptional start point, and is furthermore important for the frequency with which the gene is transcribed (the strength of the promoter). Escherichia coli Promoters In the prokaryote E.coli, the form of the RNApolymerase that is responsible for recognizing promoter sequences, has the protein subunit composition a. /3/3Qr.