Statistical Learning
Functional Gaussian Process Model for Bayesian Nonparametric Analysis
Duan, Leo L., Wang, Xia, Szczesniak, Rhonda D.
Gaussian process is a theoretically appealing model for nonparametric analysis, but its computational cumbersomeness hinders its use in large scale and the existing reduced-rank solutions are usually heuristic. In this work, we propose a novel construction of Gaussian process as a projection from fixed discrete frequencies to any continuous location. This leads to a valid stochastic process that has a theoretic support with the reduced rank in the spectral density, as well as a high-speed computing algorithm. Our method provides accurate estimates for the covariance parameters and concise form of predictive distribution for spatial prediction. For non-stationary data, we adopt the mixture framework with a customized spectral dependency structure. This enables clustering based on local stationarity, while maintains the joint Gaussianness. Our work is directly applicable in solving some of the challenges in the spatial data, such as large scale computation, anisotropic covariance, spatio-temporal modeling, etc. We illustrate the uses of the model via simulations and an application on a massive dataset.
Poisson Subsampling Algorithms for Large Sample Linear Regression in Massive Data
Large sample size brings the computation bottleneck for modern data analysis. Subsampling is one of efficient strategies to handle this problem. In previous studies, researchers make more fo- cus on subsampling with replacement (SSR) than on subsampling without replacement (SSWR). In this paper we investigate a kind of SSWR, poisson subsampling (PSS), for fast algorithm in ordinary least-square problem. We establish non-asymptotic property, i.e, the error bound of the correspond- ing subsample estimator, which provide a tradeoff between computation cost and approximation efficiency. Besides the non-asymptotic result, we provide asymptotic consistency and normality of the subsample estimator. Methodologically, we propose a two-step subsampling algorithm, which is efficient with respect to a statistical objective and independent on the linear model assumption.. Synthetic and real data are used to empirically study our proposed subsampling strategies. We argue by these empirical studies that, (1) our proposed two-step algorithm has obvious advantage when the assumed linear model does not accurate, and (2) the PSS strategy performs obviously better than SSR when the subsampling ratio increases.
Stochastic Parallel Block Coordinate Descent for Large-scale Saddle Point Problems
Zhu, Zhanxing, Storkey, Amos J.
We consider convex-concave saddle point problems with a separable structure and non-strongly convex functions. We propose an efficient stochastic block coordinate descent method using adaptive primal-dual updates, which enables flexible parallel optimization for large-scale problems. Our method shares the efficiency and flexibility of block coordinate descent methods with the simplicity of primal-dual methods and utilizing the structure of the separable convex-concave saddle point problem. It is capable of solving a wide range of machine learning applications, including robust principal component analysis, Lasso, and feature selection by group Lasso, etc. Theoretically and empirically, we demonstrate significantly better performance than state-of-the-art methods in all these applications.
Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions
Shah, Amar, Ghahramani, Zoubin
We develop parallel predictive entropy search (PPES), a novel algorithm for Bayesian optimization of expensive black-box objective functions. At each iteration, PPES aims to select a batch of points which will maximize the information gain about the global maximizer of the objective. Well known strategies exist for suggesting a single evaluation point based on previous observations, while far fewer are known for selecting batches of points to evaluate in parallel. The few batch selection schemes that have been studied all resort to greedy methods to compute an optimal batch. To the best of our knowledge, PPES is the first non-greedy batch Bayesian optimization strategy. We demonstrate the benefit of this approach in optimization performance on both synthetic and real world applications, including problems in machine learning, rocket science and robotics.
Bayesian Evidence and Model Selection
Knuth, Kevin H., Habeck, Michael, Malakar, Nabin K., Mubeen, Asim M., Placek, Ben
In this paper we review the concepts of Bayesian evidence and Bayes factors, also known as log odds ratios, and their application to model selection. The theory is presented along with a discussion of analytic, approximate and numerical techniques. Specific attention is paid to the Laplace approximation, variational Bayes, importance sampling, thermodynamic integration, and nested sampling and its recent variants. Analogies to statistical physics, from which many of these techniques originate, are discussed in order to provide readers with deeper insights that may lead to new techniques. The utility of Bayesian model testing in the domain sciences is demonstrated by presenting four specific practical examples considered within the context of signal processing in the areas of signal detection, sensor characterization, scientific model selection and molecular force characterization.
Sparse Linear Models applied to Power Quality Disturbance Classification
Lรณpez-Lopera, Andrรฉs F., รlvarez, Mauricio A., Orozco, รvaro A.
Power quality (PQ) analysis describes the non-pure electric signals that are usually present in electric power systems. The automatic recognition of PQ disturbances can be seen as a pattern recognition problem, in which different types of waveform distortion are differentiated based on their features. Similar to other quasi-stationary signals, PQ disturbances can be decomposed into time-frequency dependent components by using time-frequency or time-scale transforms, also known as dictionaries. These dictionaries are used in the feature extraction step in pattern recognition systems. Short-time Fourier, Wavelets and Stockwell transforms are some of the most common dictionaries used in the PQ community, aiming to achieve a better signal representation. To the best of our knowledge, previous works about PQ disturbance classification have been restricted to the use of one among several available dictionaries. Taking advantage of the theory behind sparse linear models (SLM), we introduce a sparse method for PQ representation, starting from overcomplete dictionaries. In particular, we apply Group Lasso. We employ different types of time-frequency (or time-scale) dictionaries to characterize the PQ disturbances, and evaluate their performance under different pattern recognition algorithms. We show that the SLM reduce the PQ classification complexity promoting sparse basis selection, and improving the classification accuracy.
Optimal Subsampling Approaches for Large Sample Linear Regression
Zhu, Rong, Ma, Ping, Mahoney, Michael W., Yu, Bin
A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample from the original full sample and uses it as a surrogate for subsequent computation and estimation. In this paper, we study subsampling methods under two scenarios: approximating the full sample ordinary least-square (OLS) estimator and estimating the coefficients in linear regression. We present two algorithms, weighted estimation algorithm and unweighted estimation algorithm, and analyze asymptotic behaviors of their resulting subsample estimators under general conditions. For the weighted estimation algorithm, we propose a criterion for selecting the optimal sampling probability by making use of the asymptotic results. On the basis of the criterion, we provide two novel subsampling methods, the optimal subsampling and the predictor- length subsampling methods. The predictor-length subsampling method is based on the L2 norm of predictors rather than leverage scores. Its computational cost is scalable. For unweighted estimation algorithm, we show that its resulting subsample estimator is not consistent to the full sample OLS estimator. However, it has better performance than the weighted estimation algorithm for estimating the coefficients. Simulation studies and a real data example are used to demonstrate the effectiveness of our proposed subsampling methods.
Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric
Given a family of probability measures in P (X), the space of probability measures on a Hilbert space X, our goal in this paper is to highlight one ore more curves in P (X) that summarize efficiently that family. We propose to study this problem under the optimal transport (Wasserstein) geometry, using curves that are restricted to be geodesic segments under that metric. We show that concepts that play a key role in Euclidean PCA, such as data centering or orthogonality of principal directions, find a natural equivalent in the optimal transport geometry, using Wasserstein means and differential geometry. The implementation of these ideas is, however, computationally challenging. To achieve scalable algorithms that can handle thousands of measures, we propose to use a relaxed definition for geodesics and regularized optimal transport distances. The interest of our approach is demonstrated on images seen either as shapes or color histograms.
A Likelihood Ratio Framework for High Dimensional Semiparametric Regression
Ning, Yang, Zhao, Tianqi, Liu, Han
We propose a likelihood ratio based inferential framework for high dimensional semiparametric generalized linear models. This framework addresses a variety of challenging problems in high dimensional data analysis, including incomplete data, selection bias, and heterogeneous multitask learning. Our work has three main contributions. (i) We develop a regularized statistical chromatography approach to infer the parameter of interest under the proposed semiparametric generalized linear model without the need of estimating the unknown base measure function. (ii) We propose a new framework to construct post-regularization confidence regions and tests for the low dimensional components of high dimensional parameters. Unlike existing post-regularization inferential methods, our approach is based on a novel directional likelihood. In particular, the framework naturally handles generic regularized estimators with nonconvex penalty functions and it can be used to infer least false parameters under misspecified models. (iii) We develop new concentration inequalities and normal approximation results for U-statistics with unbounded kernels, which are of independent interest. We demonstrate the consequences of the general theory by using an example of missing data problem. Extensive simulation studies and real data analysis are provided to illustrate our proposed approach.
Learning Adversary Behavior in Security Games: A PAC Model Perspective
Sinha, Arunesh, Kar, Debarun, Tambe, Milind
Recent applications of Stackelberg Security Games (SSG), from wildlife crime to urban crime, have employed machine learning tools to learn and predict adversary behavior using available data about defender-adversary interactions. Given these recent developments, this paper commits to an approach of directly learning the response function of the adversary. Using the PAC model, this paper lays a firm theoretical foundation for learning in SSGs (e.g., theoretically answer questions about the numbers of samples required to learn adversary behavior) and provides utility guarantees when the learned adversary model is used to plan the defender's strategy. The paper also aims to answer practical questions such as how much more data is needed to improve an adversary model's accuracy. Additionally, we explain a recently observed phenomenon that prediction accuracy of learned adversary behavior is not enough to discover the utility maximizing defender strategy. We provide four main contributions: (1) a PAC model of learning adversary response functions in SSGs; (2) PAC-model analysis of the learning of key, existing bounded rationality models in SSGs; (3) an entirely new approach to adversary modeling based on a non-parametric class of response functions with PAC-model analysis and (4) identification of conditions under which computing the best defender strategy against the learned adversary behavior is indeed the optimal strategy. Finally, we conduct experiments with real-world data from a national park in Uganda, showing the benefit of our new adversary modeling approach and verification of our PAC model predictions.