Kiar, Gregory
Numerical Stability of DeepGOPlus Inference
Pepe, Inés Gonzalez, Chatelain, Yohan, Kiar, Gregory, Glatard, Tristan
Convolutional neural networks (CNNs) are currently among the most widely-used deep neural network (DNN) architectures available and achieve state-of-the-art performance for many problems. Originally applied to computer vision tasks, CNNs work well with any data with a spatial relationship, besides images, and have been applied to different fields. However, recent works have highlighted numerical stability challenges in DNNs, which also relates to their known sensitivity to noise injection. These challenges can jeopardise their performance and reliability. This paper investigates DeepGOPlus, a CNN that predicts protein function. DeepGOPlus has achieved state-of-the-art performance and can successfully take advantage and annotate the abounding protein sequences emerging in proteomics.We determine the numerical stability of the model's inference stage by quantifying the numerical uncertainty due to perturbations of the underlying floating-point data. In addition, we explore the opportunity to use reduced-precision floating point formats for DeepGOPlus inference to reduce memory consumption and latency. This is achieved by instrumenting DeepGOPlus' execution using Monte Carlo Arithmetic, a technique that experimentally quantifies floating point operation errors and VPREC, a tool that emulates results with customizable floating point precision formats. Focus is placed on the inference stage as it is the primary deliverable of the DeepGOPlus model, widely applicable across different environments. All in all, our results show that although the DeepGOPlus CNN is very stable numerically, it can only be selectively implemented with lower-precision floating-point formats. We conclude that predictions obtained from the pre-trained DeepGOPlus model are very reliable numerically, and use existing floating-point formats efficiently.
Pipeline-Invariant Representation Learning for Neuroimaging
Li, Xinhui, Fedorov, Alex, Mathur, Mrinal, Abrol, Anees, Kiar, Gregory, Plis, Sergey, Calhoun, Vince
Deep learning has been widely applied in neuroimaging, including predicting brain-phenotype relationships from magnetic resonance imaging (MRI) volumes. MRI data usually requires extensive preprocessing prior to modeling, but variation introduced by different MRI preprocessing pipelines may lead to different scientific findings, even when using the identical data. Motivated by the data-centric perspective, we first evaluate how preprocessing pipeline selection can impact the downstream performance of a supervised learning model. We next propose two pipeline-invariant representation learning methodologies, MPSL and PXL, to improve robustness in classification performance and to capture similar neural network representations. Using 2000 human subjects from the UK Biobank dataset, we demonstrate that proposed models present unique and shared advantages, in particular that MPSL can be used to improve out-of-sample generalization to new pipelines, while PXL can be used to improve within-sample prediction performance. Both MPSL and PXL can learn more similar between-pipeline representations. These results suggest that our proposed models can be applied to mitigate pipeline-related biases, and to improve prediction robustness in brain-phenotype modeling.