Band Target Entropy Minimization and Target Partial Least Squares for Spectral Recovery and Calibration Machine Learning

The resolution and calibration of pure spectra of minority components in measurements of chemical mixtures without prior knowledge of the mixture is a challenging problem. In this work, a combination of band target entropy minimization (BTEM) and target partial least squares (T-PLS) was used to obtain estimates for single pure component spectra and to calibrate those estimates in a true, one-at-a-time fashion. This approach allows for minor components to be targeted and their relative amounts estimated in the presence of other varying components in spectral data. The use of T-PLS estimation is an improvement to the BTEM method because it overcomes the need to identify all of the pure components prior to estimation. Estimated amounts from this combination were found to be similar to those obtained from a standard method, multivariate curve resolution-alternating least squares (MCR-ALS), on a simple, three component mixture dataset. Studies from two experimental datasets demonstrate where the combination of BTEM and T-PLS could model the pure component spectra and obtain concentration profiles of minor components but MCR-ALS could not.

Rapid Bayesian optimisation for synthesis of short polymer fiber materials Machine Learning

The discovery of processes for the synthesis of new materials involves many decisions about process design, operation, and material properties. Experimentation is crucial but as complexity increases, exploration of variables can become impractical using traditional combinatorial approaches. We describe an iterative method which uses machine learning to optimise process development, incorporating multiple qualitative and quantitative objectives. We demonstrate the method with a novel fluid processing platform for synthesis of short polymer fibers, and show how the synthesis process can be efficiently directed to achieve material and process objectives.

Prediction of amino acid side chain conformation using a deep neural network Machine Learning

A deep neural network based architecture was constructed to predict amino acid side chain conformation with unprecedented accuracy. Amino acid side chain conformation prediction is essential for protein homology modeling and protein design. Current widely-adopted methods use physics-based energy functions to evaluate side chain conformation. Here, using a deep neural network architecture without physics-based assumptions, we have demonstrated that side chain conformation prediction accuracy can be improved by more than 25%, especially for aromatic residues compared with current standard methods. More strikingly, the prediction method presented here is robust enough to identify individual conformational outliers from high resolution structures in a protein data bank without providing its structural factors. We envisage that our amino acid side chain predictor could be used as a quality check step for future protein structure model validation and many other potential applications such as side chain assignment in Cryo-electron microscopy, crystallography model auto-building, protein folding and small molecule ligand docking.

How to use machine learning to identify "good" customers vs "bad" customers - BDO Canada - IT Solutions


Good profitable customers rarely become unprofitable. It is more likely that they were unprofitable from the onset. Determining an approach to define customer value can be a complex decision. Traditionally, we use gross margin in identifying good and bad customers. For example, if your overhead costs are 25% of gross revenue, a good customer is anyone with a gross margin over 25%.

Carbon Black warns that artificial intelligence is not a silver bullet


The research, which Carbon Black says looked "Beyond the Hype" found that the roles of AI and ML in preventing cyber-attacks have been met with both hope and skepticism. The vast majority (93 percent) of the 400 security researchers interviewed while conducting this research said non-malware attacks pose more of a business risk than commodity malware attacks, and more importantly that these are often not stopped by traditional anti-virus offerings. Mike Viscuso, co-founder and CTO of Carbon Black told SC Media UK: "Researchers have reported seeing an increase in the number, and sophistication, of non-malware attacks. These attacks are specifically designed to evade file-based prevention mechanisms and leverage native operating system tools to keep attackers under the radar." One respondent explained: "Most users seem to be familiar with the idea that their computer or network may have accidentally become infected with a virus, but rarely consider a person who is actually attacking them in a more proactive and targeted manner."

The Care and Feeding of Machine Learning - Carbon Black


The output of this task is a series of predictions about binaries' potential maliciousness and relationships to known malware families. These predictions are validated against outside intelligence.

Accurate, fully-automated NMR spectral profiling for metabolomics Artificial Intelligence

Many diseases cause significant changes to the concentrations of small molecules (aka metabolites) that appear in a person's biofluids, which means such diseases can often be readily detected from a person's "metabolic profile". This information can be extracted from a biofluid's NMR spectrum. Today, this is often done manually by trained human experts, which means this process is relatively slow, expensive and error-prone. This paper presents a tool, Bayesil, that can quickly, accurately and autonomously produce a complex biofluid's (e.g., serum or CSF) metabolic profile from a 1D1H NMR spectrum. This requires first performing several spectral processing steps then matching the resulting spectrum against a reference compound library, which contains the "signatures" of each relevant metabolite. Many of these steps are novel algorithms and our matching step views spectral matching as an inference problem within a probabilistic graphical model that rapidly approximates the most probable metabolic profile. Our extensive studies on a diverse set of complex mixtures, show that Bayesil can autonomously find the concentration of all NMR-detectable metabolites accurately (~90% correct identification and ~10% quantification error), in <5minutes on a single CPU. These results demonstrate that Bayesil is the first fully-automatic publicly-accessible system that provides quantitative NMR spectral profiling effectively -- with an accuracy that meets or exceeds the performance of trained experts. We anticipate this tool will usher in high-throughput metabolomics and enable a wealth of new applications of NMR in clinical settings. Available at

Bayesian Source Separation Applied to Identifying Complex Organic Molecules in Space Machine Learning

Emission from a class of benzene-based molecules known as Polycyclic Aromatic Hydrocarbons (PAHs) dominates the infrared spectrum of star-forming regions. The observed emission appears to arise from the combined emission of numerous PAH species, each with its unique spectrum. Linear superposition of the PAH spectra identifies this problem as a source separation problem. It is, however, of a formidable class of source separation problems given that different PAH sources potentially number in the hundreds, even thousands, and there is only one measured spectral signal for a given astrophysical site. Fortunately, the source spectra of the PAHs are known, but the signal is also contaminated by other spectral sources. We describe our ongoing work in developing Bayesian source separation techniques relying on nested sampling in conjunction with an ON/OFF mechanism enabling simultaneous estimation of the probability that a particular PAH species is present and its contribution to the spectrum.

Random forest models of the retention constants in the thin layer chromatography Artificial Intelligence

In the current study we examine an application of the machine learning methods to model the retention constants in the thin layer chromatography (TLC). This problem can be described with hundreds or even thousands of descriptors relevant to various molecular properties, most of them redundant and not relevant for the retention constant prediction. Hence we employed feature selection to significantly reduce the number of attributes. Additionally we have tested application of the bagging procedure to the feature selection. The random forest regression models were built using selected variables. The resulting models have better correlation with the experimental data than the reference models obtained with linear regression. The cross-validation confirms robustness of the models.

Decision support system for the evolutionary classification of protein structures " Liisa Holm and Chris Sander

AAAI Conferences

Taxonomic classification has long traditions in biology. Classic work by Linnd, Darwin, Wallace organized species of plants and animals in a hierarchy based on common morphological characters. Access to the genotype has allowed molecular phylogenies to be constructed not only of species of organisms but also within and between protein families. The concept of evolution in which gradual changes to protein phenotype (structure and function) result from amino acid replacements, has made searching databases for significant sequence similarities a standard technique of functional characterization of newly determined genes. In constructing moleular phylogenies, the use of sequence information has two limitations. First, the accuracy of predicted biological function is different between orthologous (e.g., myoglobins in the muscle of whales and humans) and paralogous genes (e.g., myoglobin and leghemobin in the roots of plants). Second, protein folds appear to be compatible with a very wide range of 4o ISMB-97 Figure 1. Functional residues are conserved and cluster in 3D.