Hinrichs, Chris, Ithapu, Vamsi K, Sun, Qinyuan, Johnson, Sterling C, Singh, Vikas

Multiple hypothesis testing is a significant problem in nearly all neuroimaging studies. In order to correct for this phenomena, we require a reliable estimate of the Family-Wise Error Rate (FWER). The well known Bonferroni correction method, while simple to implement, is quite conservative, and can substantially under-power a study because it ignores dependencies between test statistics. Permutation testing, on the other hand, is an exact, non-parametric method of estimating the FWER for a given $\alpha$-threshold, but for acceptably low thresholds the computational burden can be prohibitive. In this paper, we show that permutation testing in fact amounts to populating the columns of a very large matrix ${\bf P}$. By analyzing the spectrum of this matrix, under certain conditions, we see that ${\bf P}$ has a low-rank plus a low-variance residual decomposition which makes it suitable for highly sub--sampled --- on the order of $0.5\%$ --- matrix completion methods. Based on this observation, we propose a novel permutation testing methodology which offers a large speedup, without sacrificing the fidelity of the estimated FWER. Our evaluations on four different neuroimaging datasets show that a computational speedup factor of roughly $50\times$ can be achieved while recovering the FWER distribution up to very high accuracy. Further, we show that the estimated $\alpha$-threshold is also recovered faithfully, and is stable.

Hinrichs, Chris, Ithapu, Vamsi K., Sun, Qinyuan, Johnson, Sterling C., Singh, Vikas

Multiple hypothesis testing is a significant problem in nearly all neuroimaging studies. In order to correct for this phenomena, we require a reliable estimate of the Family-Wise Error Rate (FWER). The well known Bonferroni correction method, while being simple to implement, is quite conservative, and can substantially under-power a study because it ignores dependencies between test statistics. Permutation testing, on the other hand, is an exact, non parametric method of estimating the FWER for a given α threshold, but for acceptably low thresholds the computational burden can be prohibitive. In this paper, we observe that permutation testing in fact amounts to populating the columns of a very large matrix P. By analyzing the spectrum of this matrix, under certain conditions, we see that P has a low-rank plus a low-variance residual decomposition which makes it suitable for highly sub–sampled — on the order of 0.5% — matrix completion methods. Thus, we propose a novel permutation testing methodology which offers a large speedup, without sacrificing the fidelity of the estimated FWER. Our valuations on four different neuroimaging datasets show that a computational speedup factor of roughly 50× can be achieved while recovering the FWER distribution up to very high accuracy. Further, we show that the estimated α threshold is also recovered faithfully, and is stable.

Gutierrez-Barragan, Felipe, Ithapu, Vamsi K., Hinrichs, Chris, Maumet, Camille, Johnson, Sterling C., Nichols, Thomas E., Singh, Vikas, ADNI, the

Permutation testing is a non-parametric method for obtaining the max null distribution used to compute corrected $p$-values that provide strong control of false positives. In neuroimaging, however, the computational burden of running such an algorithm can be significant. We find that by viewing the permutation testing procedure as the construction of a very large permutation testing matrix, $T$, one can exploit structural properties derived from the data and the test statistics to reduce the runtime under certain conditions. In particular, we see that $T$ is low-rank plus a low-variance residual. This makes $T$ a good candidate for low-rank matrix completion, where only a very small number of entries of $T$ ($\sim0.35\%$ of all entries in our experiments) have to be computed to obtain a good estimate. Based on this observation, we present RapidPT, an algorithm that efficiently recovers the max null distribution commonly obtained through regular permutation testing in voxel-wise analysis. We present an extensive validation on a synthetic dataset and four varying sized datasets against two baselines: Statistical NonParametric Mapping (SnPM13) and a standard permutation testing implementation (referred as NaivePT). We find that RapidPT achieves its best runtime performance on medium sized datasets ($50 \leq n \leq 200$), with speedups of 1.5x - 38x (vs. SnPM13) and 20x-1000x (vs. NaivePT). For larger datasets ($n \geq 200$) RapidPT outperforms NaivePT (6x - 200x) on all datasets, and provides large speedups over SnPM13 when more than 10000 permutations (2x - 15x) are needed. The implementation is a standalone toolbox and also integrated within SnPM13, able to leverage multi-core architectures when available.

Least-squares models such as linear regression and Linear Discriminant Analysis (LDA) are amongst the most popular statistical learning techniques. However, since their computation time increases cubically with the number of features, they are inefficient in high-dimensional neuroimaging datasets. Fortunately, for k-fold cross-validation, an analytical approach has been developed that yields the exact cross-validated predictions in least-squares models without explicitly training the model. Its computation time grows with the number of test samples. Here, this approach is systematically investigated in the context of cross-validation and permutation testing. LDA is used exemplarily but results hold for all other least-squares methods. Furthermore, a non-trivial extension to multi-class LDA is formally derived. The analytical approach is evaluated using complexity calculations, simulations, and permutation testing of an EEG/MEG dataset. Depending on the ratio between features and samples, the analytical approach is up to 10,000x faster than the standard approach (retraining the model on each training set). This allows for a fast cross-validation of least-squares models and multi-class LDA in high-dimensional data, with obvious applications in multi-dimensional datasets, Representational Similarity Analysis, and permutation testing.

Winkler, Anderson M., Renaud, Olivier, Smith, Stephen M., Nichols, Thomas E.

Canonical correlation analysis (CCA) has become a key tool for population neuroimaging for allowing investigation of association between many imaging and non-imaging variables. As age, sex and other variables are often a source of variability not of direct interest, previous work has used CCA on residuals from a model that removes these effects, then proceeded directly to permutation inference. We show that a simple permutation test, as typically used to identify significant modes of shared variation on such data adjusted for nuisance variables, produces inflated error rates. The reason is that residualisation introduces dependencies among the observations that violate the exchangeability assumption. Even in the absence of nuisance variables, however, a simple permutation test for CCA also leads to excess error rates for all canonical correlations other than the first. The reason is that a simple permutation scheme does not ignore the variability already explained by canonical variables of lower rank. Here we propose solutions for both problems: in the case of nuisance variables, we show that projecting the residuals to a lower dimensional space where exchangeability holds results in a valid permutation test; for more general cases, with or without nuisance variables, we propose estimating the canonical correlations in a stepwise manner, removing at each iteration the variance already explained. We also discuss how to address the multiplicity of tests via closure, which leads to an admissible test that is not conservative. We also provide a complete algorithm for permutation inference for CCA.