How to use machine learning to identify "good" customers vs "bad" customers - BDO Canada - IT Solutions


Good profitable customers rarely become unprofitable. This method has become simplistic, as customers rarely consume overhead costs equally. ABC bases customer profitability analysis on the principle customers consume activities. A model can be built which calculates loss ratio based on Activity Based Costing.

Carbon Black warns that artificial intelligence is not a silver bullet


The research, which Carbon Black says looked "Beyond the Hype" found that the roles of AI and ML in preventing cyber-attacks have been met with both hope and skepticism. The vast majority (93 percent) of the 400 security researchers interviewed while conducting this research said non-malware attacks pose more of a business risk than commodity malware attacks, and more importantly that these are often not stopped by traditional anti-virus offerings. Mike Viscuso, co-founder and CTO of Carbon Black told SC Media UK: "Researchers have reported seeing an increase in the number, and sophistication, of non-malware attacks. These attacks are specifically designed to evade file-based prevention mechanisms and leverage native operating system tools to keep attackers under the radar." One respondent explained: "Most users seem to be familiar with the idea that their computer or network may have accidentally become infected with a virus, but rarely consider a person who is actually attacking them in a more proactive and targeted manner."

The Care and Feeding of Machine Learning - Carbon Black


Ingest incoming binaries: extract and compute features, statistics, and abstractions from incoming binaries. Binaries come from customers, partners, and trawls of the web for the diverse goodware and malware samples. The output of this task is a series of predictions about binaries' potential maliciousness and relationships to known malware families. Intelligence comes from our partners, our customers, and Carbon Black malware analysts.

Accurate, fully-automated NMR spectral profiling for metabolomics Artificial Intelligence

Many diseases cause significant changes to the concentrations of small molecules (aka metabolites) that appear in a person's biofluids, which means such diseases can often be readily detected from a person's "metabolic profile". This information can be extracted from a biofluid's NMR spectrum. Today, this is often done manually by trained human experts, which means this process is relatively slow, expensive and error-prone. This paper presents a tool, Bayesil, that can quickly, accurately and autonomously produce a complex biofluid's (e.g., serum or CSF) metabolic profile from a 1D1H NMR spectrum. This requires first performing several spectral processing steps then matching the resulting spectrum against a reference compound library, which contains the "signatures" of each relevant metabolite. Many of these steps are novel algorithms and our matching step views spectral matching as an inference problem within a probabilistic graphical model that rapidly approximates the most probable metabolic profile. Our extensive studies on a diverse set of complex mixtures, show that Bayesil can autonomously find the concentration of all NMR-detectable metabolites accurately (~90% correct identification and ~10% quantification error), in <5minutes on a single CPU. These results demonstrate that Bayesil is the first fully-automatic publicly-accessible system that provides quantitative NMR spectral profiling effectively -- with an accuracy that meets or exceeds the performance of trained experts. We anticipate this tool will usher in high-throughput metabolomics and enable a wealth of new applications of NMR in clinical settings. Available at

Random forest models of the retention constants in the thin layer chromatography Artificial Intelligence

In the current study we examine an application of the machine learning methods to model the retention constants in the thin layer chromatography (TLC). This problem can be described with hundreds or even thousands of descriptors relevant to various molecular properties, most of them redundant and not relevant for the retention constant prediction. Hence we employed feature selection to significantly reduce the number of attributes. Additionally we have tested application of the bagging procedure to the feature selection. The random forest regression models were built using selected variables. The resulting models have better correlation with the experimental data than the reference models obtained with linear regression. The cross-validation confirms robustness of the models.