Parra, Lucas C.
Predicting breast cancer with AI for individual risk-adjusted MRI screening and early detection
Hirsch, Lukas, Huang, Yu, Makse, Hernan A., Martinez, Danny F., Hughes, Mary, Eskreis-Winkler, Sarah, Pinker, Katja, Morris, Elizabeth, Parra, Lucas C., Sutton, Elizabeth J.
Women with an increased life-time risk of breast cancer undergo supplemental annual screening MRI. We propose to predict the risk of developing breast cancer within one year based on the current MRI, with the objective of reducing screening burden and facilitating early detection. An AI algorithm was developed on 53,858 breasts from 12,694 patients who underwent screening or diagnostic MRI and accrued over 12 years, with 2,331 confirmed cancers. A first U-Net was trained to segment lesions and identify regions of concern. A second convolutional network was trained to detect malignant cancer using features extracted by the U-Net. This network was then fine-tuned to estimate the risk of developing cancer within a year in cases that radiologists considered normal or likely benign. Risk predictions from this AI were evaluated with a retrospective analysis of 9,183 breasts from a high-risk screening cohort, which were not used for training. Statistical analysis focused on the tradeoff between number of omitted exams versus negative predictive value, and number of potential early detections versus positive predictive value. The AI algorithm identified regions of concern that coincided with future tumors in 52% of screen-detected cancers. Upon directed review, a radiologist found that 71.3% of cancers had a visible correlate on the MRI prior to diagnosis, 65% of these correlates were identified by the AI model. Reevaluating these regions in 10% of all cases with higher AI-predicted risk could have resulted in up to 33% early detections by a radiologist. Additionally, screening burden could have been reduced in 16% of lower-risk cases by recommending a later follow-up without compromising current interval cancer rate. With increasing datasets and improving image quality we expect this new AI-aided, adaptive screening to meaningfully reduce screening burden and improve early detection.
Correlated Components Analysis --- Extracting Reliable Dimensions in Multivariate Data
Parra, Lucas C., Haufe, Stefan, Dmochowski, Jacek P.
How does one find data dimensions that are reliably expressed across repetitions? For example, in neuroscience one may want to identify combinations of brain signals that are reliably activated across multiple trials or subjects. For a clinical assessment with multiple ratings, one may want to identify an aggregate score that is reliably reproduced across raters. The approach proposed here --- "correlated components analysis" --- is to identify components that maximally correlate between repetitions (e.g. trials, subjects, raters). This can be expressed as the maximization of the ratio of between-repetition to within-repetition covariance, resulting in a generalized eigenvalue problem. We show that covariances can be computed efficiently without explicitly considering all pairs of repetitions, that the result is equivalent to multi-class linear discriminant analysis for unbiased signals, and that the approach also maximize reliability, defined as the mean divided by the deviation across repetitions. We also extend the method to non-linear components using kernels, discuss regularization to improve numerical stability, present parametric and non-parametric tests to establish statistical significance, and provide code.
Adaptive Template Matching with Shift-Invariant Semi-NMF
Roux, Jonathan L., Cheveignรฉ, Alain D., Parra, Lucas C.
How does one extract unknown but stereotypical events that are linearly superimposed within a signal with variable latencies and variable amplitudes? One could think of using template matching or matching pursuit to find the arbitrarily shifted linear components. However, traditional matching approaches require that the templates be known a priori. To overcome this restriction we use instead semi Non-Negative Matrix Factorization (semi-NMF) that we extend to allow for time shifts when matching the templates to the signal. The algorithm estimates templates directly from the data along with their non-negative amplitudes. The resulting method can be thought of as an adaptive template matching procedure. We demonstrate the procedure on the task of extracting spikes from single channel extracellular recordings. On these data the algorithm essentially performs spike detection and unsupervised spike clustering. Results on simulated data and extracellular recordings indicate that the method performs well for signal-to-noise ratios of 6dB or higher and that spike templates are recovered accurately provided they are sufficiently different.
Second Order Bilinear Discriminant Analysis for single trial EEG analysis
Christoforou, Christoforos, Sajda, Paul, Parra, Lucas C.
Traditional analysis methods for single-trial classification of electro-encephalography (EEG) focus on two types of paradigms: phase locked methods, in which the amplitude of the signal is used as the feature for classification, i.e. event related potentials; and second order methods, in which the feature of interest is the power of the signal, i.e event related (de)synchronization. The process of deciding which paradigm to use is ad hoc and is driven by knowledge of neurological findings. Here we propose a unified method in which the algorithm learns the best first and second order spatial and temporal features for classification of EEG based on a bilinear model. The efficiency of the method is demonstrated in simulated and real EEG from a benchmark data set for Brain Computer Interface.
Maximising Sensitivity in a Spiking Network
Bell, Anthony J., Parra, Lucas C.
We use unsupervised probabilistic machine learning ideas to try to explain the kinds of learning observed in real neurons, the goal being to connect abstract principles of self-organisation to known biophysical processes. For example, we would like to explain Spike Timing-Dependent Plasticity (see [5,6] and Figure 3A), in terms of information theory. Starting out, we explore the optimisation of a network sensitivity measure related to maximising the mutual information between input spike timings and output spike timings. Our derivations are analogous to those in ICA, except that the sensitivity of output timings to input timings is maximised, rather than the sensitivity of output'firing rates' to inputs. ICA and related approaches have been successful in explaining the learning of many properties of early visual receptive fields in rate coding models, and we are hoping for similar gains in understanding of spike coding in networks, and how this is supported, in principled probabilistic ways, by cellular biophysical processes. For now, in our initial simulations, we show that our derived rule can learn synaptic weights which can unmix, or demultiplex, mixed spike trains. That is, it can recover independent point processes embedded in distributed correlated input spike trains, using an adaptive single-layer feedforward spiking network.
Maximising Sensitivity in a Spiking Network
Bell, Anthony J., Parra, Lucas C.
We use unsupervised probabilistic machine learning ideas to try to explain thekinds of learning observed in real neurons, the goal being to connect abstract principles of self-organisation to known biophysical processes.For example, we would like to explain Spike Timing-Dependent Plasticity (see [5,6] and Figure 3A), in terms of information theory. Starting out, we explore the optimisation of a network sensitivity measurerelated to maximising the mutual information between input spike timings and output spike timings. Our derivations are analogous to those in ICA, except that the sensitivity of output timings to input timings ismaximised, rather than the sensitivity of output'firing rates' to inputs. ICA and related approaches have been successful in explaining the learning of many properties of early visual receptive fields in rate coding models,and we are hoping for similar gains in understanding of spike coding in networks, and how this is supported, in principled probabilistic ways, by cellular biophysical processes. For now, in our initial simulations, weshow that our derived rule can learn synaptic weights which can unmix, or demultiplex, mixed spike trains. That is, it can recover independent pointprocesses embedded in distributed correlated input spike trains, using an adaptive single-layer feedforward spiking network.
Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals
Parra, Lucas C., Spence, Clay, Sajda, Paul
We present evidence that several higher-order statistical properties of natural images and signals can be explained by a stochastic model which simply varies scale of an otherwise stationary Gaussian process. We discuss two interesting consequences. The first is that a variety of natural signals can be related through a common model of spherically invariant random processes, which have the attractive property that the joint densities can be constructed from the one dimensional marginal. The second is that in some cases the non-stationarity assumption and only second order methods can be explicitly exploited to find a linear basis that is equivalent to independent components obtained with higher-order methods. This is demonstrated on spectro-temporal components of speech. 1 Introduction Recently, considerable attention has been paid to understanding and modeling the non-Gaussian or "higher-order" properties of natural signals, particularly images. Several non-Gaussian properties have been identified and studied.
Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals
Parra, Lucas C., Spence, Clay, Sajda, Paul
We present evidence that several higher-order statistical properties of natural images and signals can be explained by a stochastic model which simply varies scale of an otherwise stationary Gaussian process. We discuss two interesting consequences. The first is that a variety of natural signals can be related through a common model of spherically invariant random processes, which have the attractive property that the joint densities can be constructed from the one dimensional marginal. The second is that in some cases the non-stationarity assumption and only second order methods can be explicitly exploited to find a linear basis that is equivalent to independent components obtained with higher-order methods. This is demonstrated on spectro-temporal components of speech. 1 Introduction Recently, considerable attention has been paid to understanding and modeling the non-Gaussian or "higher-order" properties of natural signals, particularly images. Several non-Gaussian properties have been identified and studied.
Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals
Parra, Lucas C., Spence, Clay, Sajda, Paul
The first is that a variety of natural signals can be related through a common modelof spherically invariant random processes, which have the attractive property that the joint densities can be constructed from the one dimensional marginal. The second is that in some cases thenon-stationarity assumption and only second order methods can be explicitly exploited to find a linear basis that is equivalent to independent components obtained with higher-order methods. This is demonstrated on spectro-temporal components of speech. 1 Introduction Recently, considerable attention has been paid to understanding and modeling the non-Gaussian or "higher-order" properties of natural signals, particularly images. Several non-Gaussian properties have been identified and studied. For example, marginal densities of features have been shown to have high kurtosis or "heavy tails", indicating a non-Gaussian, sparse representation. Another example is the "bowtie" shape of conditional distributions of neighboring features, indicating dependence ofvariances [11]. These non-Gaussian properties have motivated a number of image and signal processing algorithms that attempt to exploit higher-order s tatistics of the signals, e.g., for blind source separation. In this paper we show that these previously observed higher-order phenomena are ubiquitous and can be accounted for by a model which simply varies the scale of an otherwise stationary Gaussianprocess. This enables us to relate a variety of natural signals to one another and to spherically invariant random processes, which are well-known in the signal processing literature [6, 3].
Hierarchical Image Probability (H1P) Models
Spence, Clay, Parra, Lucas C.
We formulate a model for probability distributions on image spaces. We show that any distribution of images can be factored exactly into conditional distributionsof feature vectors at one resolution (pyramid level) conditioned on the image information at lower resolutions. We would like to factor this over positions in the pyramid levels to make it tractable, but such factoring may miss long-range dependencies. To fix this, we introduce hiddenclass labels at each pixel in the pyramid. The result is a hierarchical mixture of conditional probabilities, similar to a hidden Markov model on a tree. The model parameters can be found with maximum likelihoodestimation using the EM algorithm. We have obtained encouraging preliminary results on the problems of detecting various objects inSAR images and target recognition in optical aerial images. 1 Introduction