Markov Random Fields and Mass Spectra Discrimination
Mass spectrometry can involve two soft ionization techniques: matrix-assisted laser desorption ionization (MALDI) and surface-enhanced laser desorption and ionization (SELDI). For each analyzed fluid sample, MALDI or SELDI hardwares generate a high-dimensional mass spectrum, recording between 10,000 and 20,000 "mass-to-charge (m/z) ratios" corresponding to the ionized peptides present in the fluid sample, as well as "intensities" roughly quantifying the concentrations of these peptides in the sample. Generally m/z ratios take values anywhere between 200 and 20,000 Daltons, and are acquired with a known relative accuracy ρ which depends on the acquisition modalities, and ranges from 0.1% to 0.3%. Analyzing this type of high dimensional data oftern requires specialized software tools, implementing sophisticated machine learning techniques such as SVM (support vector machines) (Li and others (2004), Yu and others (2005)), artificial neural networks (Ball and others (2002)), or random forests (Izmirlian (2004)). These techniques typically generate "black-box" classifiers, which often reach good discrimination levels between cancerous and control groups, but are difficult to interpret biologically in terms of characteristic biomarkers patterns. This often leads to unexpected performance variations on totally new data sets. To develop clinically usable software tools for analysis of mass spectra acuired by MALDI or SELDI hardwares, a key step is to implement automated discovery of explicit "signatures", i.e. short lists of proteomic biomarkers with high discriminating powers between cancer groups (Yasui and others (2003)). Some easily interpretable automatic classifiers, such as linear combinations of biomarker weights (Wang and Chang (2011)), can be found in previous studies, but these approaches do not attempt to quantify the discriminating impact of simultaneous presence for specific pairs of biomarkers.
Oct-13-2014