Suckling, John
The Explanation Necessity for Healthcare AI
Mamalakis, Michail, de Vareilles, Héloïse, Murray, Graham, Lio, Pietro, Suckling, John
Explainability is often critical to the acceptable implementation of artificial intelligence (AI). Nowhere is this more important than healthcare where decision-making directly impacts patients and trust in AI systems is essential. This trust is often built on the explanations and interpretations the AI provides. Despite significant advancements in AI interpretability, there remains the need for clear guidelines on when and to what extent explanations are necessary in the medical context. We propose a novel categorization system with four distinct classes of explanation necessity, guiding the level of explanation required: patient or sample (local) level, cohort or dataset (global) level, or both levels. We introduce a mathematical formulation that distinguishes these categories and offers a practical framework for researchers to determine the necessity and depth of explanations required in medical AI applications. Three key factors are considered: the robustness of the evaluation protocol, the variability of expert observations, and the representation dimensionality of the application. In this perspective, we address the question: When does an AI medical application need to be explained, and at what level of detail?
A 3D explainability framework to uncover learning patterns and crucial sub-regions in variable sulci recognition
Mamalakis, Michail, de Vareilles, Heloise, AI-Manea, Atheer, Mitchell, Samantha C., Arartz, Ingrid, Morch-Johnsen, Lynn Egeland, Garrison, Jane, Simons, Jon, Lio, Pietro, Suckling, John, Murray, Graham
A B S T R A C T Precisely identifying sulcal features in brain MRI is made challenging by the variability of brain folding. This research introduces an innovative 3D explainability frame-work that validates outputs from deep learning networks in their ability to detect the paracin-gulate sulcus, an anatomical feature that may or may not be present on the frontal medial surface of the human brain. This study trained and tested two networks, amalgamating local explainability techniques GradCam and SHAP with a dimensionality reduction method. The explainability framework provided both localized and global explanations, along with accuracy of classification results, revealing pertinent sub-regions contributing to the decision process through a post-fusion transformation of explanatory and statistical features. Leveraging the TOP-OSLO dataset of MRI acquired from patients with schizophrenia, greater accuracies of paracingulate sulcus detection (presence or absence) were found in the left compared to right hemispheres with distinct, but extensive sub-regions contributing to each classification outcome. The study also inadvertently highlighted the critical role of an unbiased annotation protocol in maintaining network performance fairness. Our proposed method not only o ff ers automated, impartial annotations of a variable sulcus but also provides insights into the broader anatomical variations associated with its presence throughout the brain. The adoption of this methodology holds promise for instigating further explorations and inquiries in the field of neuroscience.1. Introduction While the folding of the primary sulci of the human brain, formed during gestation, is broadly stable across individuals, the secondary sulci which continue to develop post-natally are unique to each individual. Inter-individual variability poses a significant challenge for the detection and accurately annotation of sulcal features from MRI of the brain. Undertaking this task manually is time-consuming with outcomes that depend on the rater. This prevents the e fficient leveraging of the large, open-access MRI databases that are available. While primary sulci can be very accurately detected with automated methods, secondary sulci pose a more di fficult computational problem due to their higher variability in shape and indeed presence or absense [3]. A successful automated method would facilitate investigations of brain folding variation, representative of events occurring during a critical developmental period. Furthermore, generalized and unbiased annotations would make tractable large-scale studies of cognitive and behavioral development, and the emergence of mental and neurological disorders with high levels of statistical power. The folding of the brain has been linked to brain function, and some specific folding patterns have been related to susceptibility to neurological adversities [20].
Deep Learning in current Neuroimaging: a multivariate approach with power and type I error control but arguable generalization ability
Jiménez-Mesa, Carmen, Ramírez, Javier, Suckling, John, Vöglein, Jonathan, Levin, Johannes, Górriz, Juan Manuel, ADNI, Alzheimer's Disease Neuroimaging Initiative, DIAN, Dominantly Inherited Alzheimer Network
Discriminative analysis in neuroimaging by means of deep/machine learning techniques is usually tested with validation techniques, whereas the associated statistical significance remains largely under-developed due to their computational complexity. In this work, a non-parametric framework is proposed that estimates the statistical significance of classifications using deep learning architectures. In particular, a combination of autoencoders (AE) and support vector machines (SVM) is applied to: (i) a one-condition, within-group designs often of normal controls (NC) and; (ii) a two-condition, between-group designs which contrast, for example, Alzheimer's disease (AD) patients with NC (the extension to multi-class analyses is also included). A random-effects inference based on a label permutation test is proposed in both studies using cross-validation (CV) and resubstitution with upper bound correction (RUB) as validation methods. This allows both false positives and classifier overfitting to be detected as well as estimating the statistical power of the test. Several experiments were carried out using the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, the Dominantly Inherited Alzheimer Network (DIAN) dataset, and a MCI prediction dataset. We found in the permutation test that CV and RUB methods offer a false positive rate close to the significance level and an acceptable statistical power (although lower using cross-validation). A large separation between training and test accuracies using CV was observed, especially in one-condition designs. This implies a low generalization ability as the model fitted in training is not informative with respect to the test set. We propose as solution by applying RUB, whereby similar results are obtained to those of the CV test set, but considering the whole set and with a lower computational cost per iteration.
A connection between the pattern classification problem and the General Linear Model for statistical inference
Gorriz, Juan Manuel, group, SIPBA, Suckling, John
A connection between the General Linear Model (GLM) in combination with classical statistical inference and the machine learning (MLE)-based inference is described in this paper. Firstly, the estimation of the GLM parameters is expressed as a Linear Regression Model (LRM) of an indicator matrix, that is, in terms of the inverse problem of regressing the observations. In other words, both approaches, i.e. GLM and LRM, apply to different domains, the observation and the label domains, and are linked by a normalization value at the least-squares solution. Subsequently, from this relationship we derive a statistical test based on a more refined predictive algorithm, i.e. the (non)linear Support Vector Machine (SVM) that maximizes the class margin of separation, within a permutation analysis. The MLE-based inference employs a residual score and includes the upper bound to compute a better estimation of the actual (real) error. Experimental results demonstrate how the parameter estimations derived from each model resulted in different classification performances in the equivalent inverse problem. Moreover, using real data the aforementioned predictive algorithms within permutation tests, including such model-free estimators, are able to provide a good trade-off between type I error and statistical power.