Accuracy
SOFAR: large-scale association network learning
Uematsu, Yoshimasa, Fan, Yingying, Chen, Kun, Lv, Jinchi, Lin, Wei
Many modern big data applications feature large scale in both numbers of responses and predictors. Better statistical efficiency and scientific insights can be enabled by understanding the large-scale response-predictor association network structures via layers of sparse latent factors ranked by importance. Yet sparsity and orthogonality have been two largely incompatible goals. To accommodate both features, in this paper we suggest the method of sparse orthogonal factor regression (SOFAR) via the sparse singular value decomposition with orthogonality constrained optimization to learn the underlying association networks, with broad applications to both unsupervised and supervised learning tasks such as biclustering with sparse singular value decomposition, sparse principal component analysis, sparse factor analysis, and spare vector autoregression analysis. Exploiting the framework of convexity-assisted nonconvex optimization, we derive nonasymptotic error bounds for the suggested procedure characterizing the theoretical advantages. The statistical guarantees are powered by an efficient SOFAR algorithm with convergence property. Both computational and theoretical advantages of our procedure are demonstrated with several simulation and real data examples.
Pruning variable selection ensembles
Zhang, Chunxia, Wu, Yilei, Zhu, Mu
In the context of variable selection, ensemble learning has gained increasing interest due to its great potential to improve selection accuracy and to reduce false discovery rate. A novel ordering-based selective ensemble learning strategy is designed in this paper to obtain smaller but more accurate ensembles. In particular, a greedy sorting strategy is proposed to rearrange the order by which the members are included into the integration process. Through stopping the fusion process early, a smaller subensemble with higher selection accuracy can be obtained. More importantly, the sequential inclusion criterion reveals the fundamental strength-diversity trade-off among ensemble members. By taking stability selection (abbreviated as StabSel) as an example, some experiments are conducted with both simulated and real-world data to examine the performance of the novel algorithm. Experimental results demonstrate that pruned StabSel generally achieves higher selection accuracy and lower false discovery rates than StabSel and several other benchmark methods.
Exploiting random projections and sparsity with random forests and gradient boosting methods -- Application to multi-label and multi-output learning, random forest model compression and leveraging input sparsity
Within machine learning, the supervised learning field aims at modeling the input-output relationship of a system, from past observations of its behavior. Decision trees characterize the input-output relationship through a series of nested $if-then-else$ questions, the testing nodes, leading to a set of predictions, the leaf nodes. Several of such trees are often combined together for state-of-the-art performance: random forest ensembles average the predictions of randomized decision trees trained independently in parallel, while tree boosting ensembles train decision trees sequentially to refine the predictions made by the previous ones. The emergence of new applications requires scalable supervised learning algorithms in terms of computational power and memory space with respect to the number of inputs, outputs, and observations without sacrificing accuracy. In this thesis, we identify three main areas where decision tree methods could be improved for which we provide and evaluate original algorithmic solutions: (i) learning over high dimensional output spaces, (ii) learning with large sample datasets and stringent memory constraints at prediction time and (iii) learning over high dimensional sparse input spaces.
A Flexible Framework for Hypothesis Testing in High-dimensions
Javanmard, Adel, Lee, Jason D.
Hypothesis testing in the linear regression model is a fundamental statistical problem. We consider linear regression in the high-dimensional regime where the number of parameters exceeds the number of samples ($p> n$) and assume that the high-dimensional parameters vector is $s_0$ sparse. We develop a general and flexible $\ell_\infty$ projection statistic for hypothesis testing in this model. Our framework encompasses testing whether the parameter lies in a convex cone, testing the signal strength, testing arbitrary functionals of the parameter, and testing adaptive hypothesis. We show that the proposed procedure controls the type I error under the standard assumption of $s_0 (\log p)/\sqrt{n}\to 0$, and also analyze the power of the procedure. Our numerical experiments confirms our theoretical findings and demonstrate that we control false positive rate (type I error) near the nominal level, and have high power.
WWE Payback 2017: Predictions, Match Card For First 'Monday Night Raw' PPV After WrestleMania
WWE's first pay-per-view since WrestleMania 33 is set for Sunday night in San Jose, California. Payback 2017 will feature mostly members of the "Monday Night Raw," roster, though a couple of "SmackDown Live" wrestlers are on the card. WWE Champion Randy Orton is still with the blue brand after the Superstar Shake-up, but he's concluding his feud with Bray Wyatt at Payback. Kevin Owens has appeared on "SmackDown Live" twice since WrestleMania 33, and he'll put his United States Championship on the line against Chris Jericho. Living up to its name, Payback features several rematches.
Fisher consistency for prior probability shift
We introduce Fisher consistency in the sense of unbiasedness as a desirable property for estimators of class prior probabilities. Lack of Fisher consistency could be used as a criterion to dismiss estimators that are unlikely to deliver precise estimates in test datasets under prior probability and more general dataset shift. The usefulness of this unbiasedness concept is demonstrated with three examples of classifiers used for quantification: Adjusted Classify & Count, EM-algorithm and CDE-Iterate. We find that Adjusted Classify & Count and EM-algorithm are Fisher consistent. A counter-example shows that CDE-Iterate is not Fisher consistent and, therefore, cannot be trusted to deliver reliable estimates of class probabilities.
Denoising Linear Models with Permuted Data
Pananjady, Ashwin, Wainwright, Martin J., Courtade, Thomas A.
The multivariate linear regression model with shuffled data and additive Gaussian noise arises in various correspondence estimation and matching problems. Focusing on the denoising aspect of this problem, we provide a characterization the minimax error rate that is sharp up to logarithmic factors. We also analyze the performance of two versions of a computationally efficient estimator, and establish their consistency for a large range of input parameters. Finally, we provide an exact algorithm for the noiseless problem and demonstrate its performance on an image point-cloud matching task. Our analysis also extends to datasets with outliers.
Bootstrapping Graph Convolutional Neural Networks for Autism Spectrum Disorder Classification
Anirudh, Rushil, Thiagarajan, Jayaraman J.
Using predictive models to identify patterns that can act as biomarkers for different neuropathoglogical conditions is becoming highly prevalent. In this paper, we consider the problem of Autism Spectrum Disorder (ASD) classification. While non-invasive imaging measurements, such as the rest state fMRI, are typically used in this problem, it can be beneficial to incorporate a wide variety of non-imaging features, including personal and socio-cultural traits, into predictive modeling. We propose to employ a graph-based approach for combining both types of feature, where a contextual graph encodes the traits of a larger population while the brain activity patterns are defined as a multivariate function at the nodes of the graph. Since the underlying graph dictates the performance of the resulting predictive models, we explore the use of different graph construction strategies. Furthermore, we develop a bootstrapped version of graph convolutional neural networks (G-CNNs) that utilizes an ensemble of weakly trained G-CNNs to avoid overfitting and also reduce the sensitivity of the models on the choice of graph construction. We demonstrate its effectiveness on the Autism Brain Imaging Data Exchange (ABIDE) dataset and show that the proposed approach outperforms state-of-the-art approaches for this problem.
Data-adaptive statistics for multiple hypothesis testing in high-dimensional settings
Cai, Weixin, Hejazi, Nima S., Hubbard, Alan E.
Current statistical inference problems in areas like astronomy, genomics, and marketing routinely involve the simultaneous testing of thousands -- even millions -- of null hypotheses. For high-dimensional multivariate distributions, these hypotheses may concern a wide range of parameters, with complex and unknown dependence structures among variables. In analyzing such hypothesis testing procedures, gains in efficiency and power can be achieved by performing variable reduction on the set of hypotheses prior to testing. We present in this paper an approach using data-adaptive multiple testing that serves exactly this purpose. This approach applies data mining techniques to screen the full set of covariates on equally sized partitions of the whole sample via cross-validation. This generalized screening procedure is used to create average ranks for covariates, which are then used to generate a reduced (sub)set of hypotheses, from which we compute test statistics that are subsequently subjected to standard multiple testing corrections. The principal advantage of this methodology lies in its providing valid statistical inference without the \textit{a priori} specifying which hypotheses will be tested. Here, we present the theoretical details of this approach, confirm its validity via a simulation study, and exemplify its use by applying it to the analysis of data on microRNA differential expression.
Efficient variational Bayesian neural network ensembles for outlier detection
Pawlowski, Nick, Jaques, Miguel, Glocker, Ben
In this work we perform outlier detection using ensembles of neural networks obtained by variational approximation of the posterior in a Bayesian neural network setting. The variational parameters are obtained by sampling from the true posterior by gradient descent. We show our outlier detection results are comparable to those obtained using other efficient ensembling methods.