Country
Projection pursuit with applications to scRNA sequencing data
In this paper, we explore the limitations of PCA as a dimension reduction technique and study its extension, projection pursuit (PP), which is a broad class of linear dimension reduction methods. PCA is a popular dimension reduction technique commonly applied to scRNA sequencing data. Despite of huge success in practice, we will illustrate three drawbacks of PCA. It is well known that the eigenvalues of sample covariance matrix is not consistent in high dimensional cases. Every principal component is uncorrelated with each other but not independent.
Network of Evolvable Neural Units: Evolving to Learn at a Synaptic Level
Bertens, Paul, Lee, Seong-Whan
Although Deep Neural Networks have seen great success in recent years through various changes in overall architectures and optimization strategies, their fundamental underlying design remains largely unchanged. Computational neuroscience on the other hand provides more biologically realistic models of neural processing mechanisms, but they are still high level abstractions of the actual experimentally observed behaviour. Here a model is proposed that bridges Neuroscience, Machine Learning and Evolutionary Algorithms to evolve individual soma and synaptic compartment models of neurons in a scalable manner. Instead of attempting to manually derive models for all the observed complexity and diversity in neural processing, we propose an Evolvable Neural Unit (ENU) that can approximate the function of each individual neuron and synapse. We demonstrate that this type of unit can be evolved to mimic Integrate-And-Fire neurons and synaptic Spike-Timing-Dependent Plasticity. Additionally, by constructing a new type of neural network where each synapse and neuron is such an evolvable neural unit, we show it is possible to evolve an agent capable of learning to solve a T-maze environment task. This network independently discovers spiking dynamics and reinforcement type learning rules, opening up a new path towards biologically inspired artificial intelligence.
Statistical significance in high-dimensional linear mixed models
Lin, Lina, Drton, Mathias, Shojaie, Ali
This paper concerns the development of an inferential framework for high-dimensional linear mixed effect models. These are suitable models, for instance, when we have $n$ repeated measurements for $M$ subjects. We consider a scenario where the number of fixed effects $p$ is large (and may be larger than $M$), but the number of random effects $q$ is small. Our framework is inspired by a recent line of work that proposes de-biasing penalized estimators to perform inference for high-dimensional linear models with fixed effects only. In particular, we demonstrate how to correct a `naive' ridge estimator in extension of work by B\"uhlmann (2013) to build asymptotically valid confidence intervals for mixed effect models. We validate our theoretical results with numerical experiments, in which we show our method outperforms those that fail to account for correlation induced by the random effects. For a practical demonstration we consider a riboflavin production dataset that exhibits group structure, and show that conclusions drawn using our method are consistent with those obtained on a similar dataset without group structure.
Self-Play Learning Without a Reward Metric
Schmidt, Dan, Moran, Nick, Rosenfeld, Jonathan S., Rosenthal, Jonathan, Yedidia, Jonathan
The AlphaZero algorithm for the learning of strategy games via self-play, which has produced superhuman ability in the games of Go, chess, and shogi, uses a quantitative reward function for game outcomes, requiring the users of the algorithm to explicitly balance different components of the reward against each other, such as the game winner and margin of victory. We present a modification to the AlphaZero algorithm that requires only a total ordering over game outcomes, obviating the need to perform any quantitative balancing of reward components. We demonstrate that this system learns optimal play in a comparable amount of time to AlphaZero on a sample game.
A Robust Spectral Clustering Algorithm for Sub-Gaussian Mixture Models with Outliers
Srivastava, Prateek R., Sarkar, Purnamrita, Hanasusanto, Grani A.
We consider the problem of clustering datasets in the presence of arbitrary outliers. Traditional clustering algorithms such as k-means and spectral clustering are known to perform poorly for datasets contaminated with even a small number of outliers. In this paper, we develop a provably robust spectral clustering algorithm that applies a simple rounding scheme to denoise a Gaussian kernel matrix built from the data points, and uses vanilla spectral clustering to recover the cluster labels of data points. We analyze the performance of our algorithm under the assumption that the "good" inlier data points are generated from a mixture of sub-gaussians, while the "noisy" outlier points can come from any arbitrary probability distribution. For this general class of models, we show that the asymptotic mis-classification error decays at an exponential rate in the signal-to-noise ratio, provided the number of outliers are a small fraction of the inlier points. Surprisingly, the derived error bound matches with the best-known bound for semidefinite programs (SDPs) under the same setting without outliers. We conduct extensive experiments on a variety of simulated and real-world datasets to demonstrate that our algorithm is less sensitive to outliers compared to other state-of-the-art algorithms proposed in the literature, in terms of both accuracy as well as scalability.
Realization of spatial sparseness by deep ReLU nets with massive data
Chui, Charles K., Lin, Shao-Bo, Zhang, Bo, Zhou, Ding-Xuan
--The great success of deep learning poses urgent challenges for understanding its working mechanism and rationality. The depth, structure, and massive size of the data are recognized to be three key ingredients for deep learning. In this paper, we aim at rigorous verification of the importance of massive data in embodying the out-performance of deep learning. T o approximate and learn spatially sparse and smooth functions, we establish a novel sampling theorem in learning theory to show the necessity of massive data. We then prove that implementing the classical empirical risk minimization on some deep nets facilitates in realization of the optimal learning rates derived in the sampling theorem. This perhaps explains why deep learning performs so well in the era of big data. With the rapid development of data mining and knowledge discovery, data of massive size are collected in various disciplines [50], including medical diagnosis, financial market analysis, computer vision, natural language processing, time series forecasting, and search engines. These massive data bring additional opportunities to discover subtle data features which cannot be reflected by data of small size while creating a crucial challenge on machine learning to develop learning schemes to realize benefits by exploring the use of massive data. Although numerous learning schemes such as distributed learning [26], localized learning [32] and sub-sampling [14] have been proposed to handle massive data, all these schemes focused on the tractability rather than the benefit of massiveness. Therefore, it remains open to explore the benefits brought from massive data and to develop feasible learning strategies for realizing these benefits. Deep learning [18], characterized by training deep neural networks (deep nets for short) to extract data features by using rich computational resources such as computational power of modern graphical processor units (GPUs) and custom processors, has made remarkable success in computer vision [23], speech recognition [24] and game theory [40], practically showing its power in tackling massive data. C.K. Chui is also associated with the Department of Statistics, Stanford University, CA 94305, USA. Shao-Bo Lin is with the Center of Intelligent Decision-making and Machine Learning, School of Management, Xi'an Jiaotong University, Xi'an, China.
On-manifold Adversarial Data Augmentation Improves Uncertainty Calibration
Patel, Kanil, Beluch, William, Zhang, Dan, Pfeiffer, Michael, Yang, Bin
Uncertainty estimates help to identify ambiguous, novel, or anomalous inputs, but the reliable quantification of uncertainty has proven to be challenging for modern deep networks. To improve uncertainty estimation, we propose On-Manifold Adversarial Data Augmentation or OMADA, which specifically attempts to generate the most challenging examples by following an on-manifold adversarial attack path in the latent space of an autoencoder-based generative model that closely approximates decision boundaries between two or more classes. On a variety of datasets and for multiple network architectures, OMADA consistently yields more accurate and better calibrated classifiers than baseline models, and outperforms competing approaches such as Mixup and CutMix, as well as achieving similar performance to (at times better than) post-processing calibration methods such as temperature scaling. Variants of OMADA can employ different sampling schemes for ambiguous on-manifold examples based on the entropy of their estimated soft labels, which exhibit specific strengths for generalization, calibration of predicted uncertainty, or detection of out-of-distribution inputs.
VarNet: Variational Neural Networks for the Solution of Partial Differential Equations
Khodayi-Mehr, Reza, Zavlanos, Michael M.
In this paper we propose a new model-based unsupervised learning method, called VarNet, for the solution of partial differential equations (PDEs) using deep neural networks (NNs). Particularly, we propose a novel loss function that relies on the variational (integral) form of PDEs as apposed to their differential form which is commonly used in the literature. Our loss function is discretization-free, highly parallelizable, and more effective in capturing the solution of PDEs since it employs lower-order derivatives and trains over measure non-zero regions of space-time. Given this loss function, we also propose an approach to optimally select the space-time samples, used to train the NN, that is based on the feedback provided from the PDE residual. The models obtained using VarNet are smooth and do not require interpolation. They are also easily differentiable and can directly be used for control and optimization of PDEs. Finally, VarNet can straight-forwardly incorporate parametric PDE models making it a natural tool for model order reduction (MOR) of PDEs. We demonstrate the performance of our method through extensive numerical experiments for the advection-diffusion PDE as an important case-study.
Multi-stream Data Analytics for Enhanced Performance Prediction in Fantasy Football
Bonello, Nicholas, Beel, Joeran, Lawless, Seamus, Debattista, Jeremy
Fantasy Premier League (FPL) performance predictors tend to base their algorithms purely on historical statistical data. The main problems with this approach is that external factors such as injuries, managerial decisions and other tournament match statistics can never be factored into the final predictions. In this paper, we present a new method for predicting future player performances by automatically incorporating human feedback into our model. Through statistical data analysis such as previous performances, upcoming fixture difficulty ratings, betting market analysis, opinions of the general-public and experts alike via social media and web articles, we can improve our understanding of who is likely to perform well in upcoming matches. When tested on the English Premier League 2018/19 season, the model outperformed regular statistical predictors by over 300 points, an average of 11 points per week, ranking within the top 0.5% of players rank 30,000 out of over 6.5 million players.
A Unified Framework for Random Forest Prediction Error Estimation
We introduce a unified framework for random forest prediction err or estimation based on a novel estimator of the conditional prediction error distribution function. Our framework enables immediate estimation of key parameters often of interest, inc luding conditional mean squared prediction errors, conditional biases, and conditional qu antiles, by a straightforward plugin routine. Our approach is particularly well-adapted for p rediction interval estimation, which has received less attention in the random forest lit erature despite its practical utility; we show via simulations that our proposed predictio n intervals are competitive with, and in some settings outperform, existing methods. T o establish theoretical grounding for our framework, we prove pointwise uniform consiste ncy of a more stringent version of our estimator of the conditional prediction error distrib ution. In addition to providing a suite of measures of prediction uncertainty, our gener al framework is applicable to many variants of the random forest algorithm. The estimator s introduced here are implemented in the R package forestError .