AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Exact Post Model Selection Inference for Marginal Screening

arXiv.org Machine LearningFeb-27-2014

We develop a framework for post model selection inference, via marginal screening, in linear regression. At the core of this framework is a result that characterizes the exact distribution of linear functions of the response $y$, conditional on the model being selected (``condition on selection" framework). This allows us to construct valid confidence intervals and hypothesis tests for regression coefficients that account for the selection procedure. In contrast to recent work in high-dimensional statistics, our results are exact (non-asymptotic) and require no eigenvalue-like assumptions on the design matrix $X$. Furthermore, the computational cost of marginal regression, constructing confidence intervals and hypothesis testing is negligible compared to the cost of linear regression, thus making our methods particularly suitable for extremely large datasets. Although we focus on marginal screening to illustrate the applicability of the condition on selection framework, this framework is much more broadly applicable. We show how to apply the proposed framework to several other selection procedures including orthogonal matching pursuit, non-negative least squares, and marginal screening+Lasso.

artificial intelligence, confidence interval, machine learning, (16 more...)

arXiv.org Machine Learning

1402.5596

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Robust Asymmetric Clustering

Morris, Katherine, McNicholas, Paul D., Punzo, Antonio, Browne, Ryan P.

arXiv.org Machine LearningFeb-26-2014

Contaminated mixture models are developed for model-based clustering of data with asymmetric clusters as well as spurious points, outliers, and/or noise. Specifically, we introduce a contaminated mixture of contaminated shifted asymmetric Laplace distributions and a contaminated mixture of contaminated skew-normal distributions. In each case, mixture components have a parameter controlling the proportion of bad points (i.e., spurious points, outliers, and/or noise) and one specifying the degree of contamination. A very important feature of our approaches is that these parameters do not have to be specified a priori. Expectation-conditional maximization algorithms are outlined for parameter estimation and the number of components is selected using the Bayesian information criterion. The performance of our approaches is illustrated on artificial and real data.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1402.6744

Country: North America > United States > Kansas (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Machine Learning at Scale

Izrailev, Sergei, Stanley, Jeremy M.

arXiv.org Machine LearningFeb-25-2014

It takes skill to build a meaningful predictive model even with the abundance of implementations of modern machine learning algorithms and readily available computing resources. Building a model becomes challenging if hundreds of terabytes of data need to be processed to produce the training data set. In a digital advertising technology setting, we are faced with the need to build thousands of such models that predict user behavior and power advertising campaigns in a 24/7 chaotic real-time production environment. As data scientists, we also have to convince other internal departments critical to implementation success, our management, and our customers that our machine learning system works. In this paper, we present the details of the design and implementation of an automated, robust machine learning platform that impacts billions of advertising impressions monthly. This platform enables us to continuously optimize thousands of campaigns over hundreds of millions of users, on multiple continents, against varying performance objectives.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1402.6076

Country: North America > United States > New York (0.28)

Genre: Research Report > Experimental Study (0.95)

Industry:

Marketing (1.00)
Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Regularization of $\ell_1$ minimization for dealing with outliers and noise in Statistics and Signal Recovery

Flores, Salvador, Briceno-Arias, Luis M.

arXiv.org Machine LearningFeb-25-2014

We study the robustness properties of $\ell_1$ norm minimization for the classical linear regression problem with a given design matrix and contamination restricted to the dependent variable. We perform a fine error analysis of the $\ell_1$ estimator for measurements errors consisting of outliers coupled with noise. We introduce a new estimation technique resulting from a regularization of $\ell_1$ minimization by inf-convolution with the $\ell_2$ norm. Concerning robustness to large outliers, the proposed estimator keeps the breakdown point of the $\ell_1$ estimator, and reduces to least squares when there are not outliers. We present a globally convergent forward-backward algorithm for computing our estimator and some numerical experiments confirming its theoretical properties.

artificial intelligence, estimator, machine learning, (20 more...)

arXiv.org Machine Learning

1310.7637

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Path Thresholding: Asymptotically Tuning-Free High-Dimensional Sparse Regression

Vats, Divyanshu, Baraniuk, Richard G.

arXiv.org Machine LearningFeb-23-2014

In this paper, we address the challenging problem of selecting tuning parameters for high-dimensional sparse regression. We propose a simple and computationally efficient method, called path thresholding (PaTh), that transforms any tuning parameter-dependent sparse regression algorithm into an asymptotically tuning-free sparse regression algorithm. More specifically, we prove that, as the problem size becomes large (in the number of variables and in the number of observations), PaTh performs accurate sparse regression, under appropriate conditions, without specifying a tuning parameter. In finite-dimensional settings, we demonstrate that PaTh can alleviate the computational burden of model selection algorithms by significantly reducing the search space of tuning parameters.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

1402.5584

Country: Europe (0.28)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)

Add feedback

Bayesian Inference for NMR Spectroscopy with Applications to Chemical Quantification

Wilson, Andrew Gordon, Wu, Yuting, Holland, Daniel J., Nowozin, Sebastian, Mantle, Mick D., Gladden, Lynn F., Blake, Andrew

arXiv.org Machine LearningFeb-23-2014

Nuclear magnetic resonance (NMR) spectroscopy exploits the magnetic properties of atomic nuclei to discover the structure, reaction state and chemical environment of molecules. We propose a probabilistic generative model and inference procedures for NMR spectroscopy. Specifically, we use a weighted sum of trigonometric functions undergoing exponential decay to model free induction decay (FID) signals. We discuss the challenges in estimating the components of this general model -- amplitudes, phase shifts, frequencies, decay rates, and noise variances -- and offer practical solutions. We compare with conventional Fourier transform spectroscopy for estimating the relative concentrations of chemicals in a mixture, using synthetic and experimentally acquired FID signals. We find the proposed model is particularly robust to low signal to noise ratios (SNR), and overlapping peaks in the Fourier transform of the FID, enabling accurate predictions (e.g., 1% sensitivity at low SNR) which are not possible with conventional spectroscopy (5% sensitivity).

artificial intelligence, concentration, machine learning, (16 more...)

arXiv.org Machine Learning

1402.358

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.29)

Genre:

Research Report (0.50)
Personal > Honors (0.46)

Industry: Materials > Chemicals (0.49)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Accelerating ABC methods using Gaussian processes

Wilkinson, Richard D

arXiv.org Machine LearningFeb-23-2014

Approximate Bayesian computation (ABC) methods are used to approximate posterior distributions using simulation rather than likelihood calculations. We introduce Gaussian process (GP) accelerated ABC, which we show can significantly reduce the number of simulations required. As computational resource is usually the main determinant of accuracy in ABC, GP-accelerated methods can thus enable more accurate inference in some models. GP models of the unknown log-likelihood function are used to exploit continuity and smoothness, reducing the required computation. We use a sequence of models that increase in accuracy, using intermediate models to rule out regions of the parameter space as implausible. The methods will not be suitable for all problems, but when they can be used, can result in significant computational savings. For the Ricker model, we are able to achieve accurate approximations to the posterior distribution using a factor of 100 fewer simulator evaluations than comparable Monte Carlo approaches, and for a population genetics model we are able to approximate the exact posterior for the first time.

artificial intelligence, bayesian inference, machine learning, (13 more...)

arXiv.org Machine Learning

1401.1436

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Swapping Variables for High-Dimensional Sparse Regression with Correlated Measurements

Vats, Divyanshu, Baraniuk, Richard G.

arXiv.org Machine LearningFeb-22-2014

We consider the high-dimensional sparse linear regression problem of accurately estimating a sparse vector using a small number of linear measurements that are contaminated by noise. It is well known that the standard cadre of computationally tractable sparse regression algorithms---such as the Lasso, Orthogonal Matching Pursuit (OMP), and their extensions---perform poorly when the measurement matrix contains highly correlated columns. To address this shortcoming, we develop a simple greedy algorithm, called SWAP, that iteratively swaps variables until convergence. SWAP is surprisingly effective in handling measurement matrices with high correlations. In fact, we prove that SWAP outputs the true support, the locations of the non-zero entries in the sparse vector, under a relatively mild condition on the measurement matrix. Furthermore, we show that SWAP can be used to boost the performance of any sparse regression algorithm. We empirically demonstrate the advantages of SWAP by comparing it with several state-of-the-art sparse regression algorithms.

algorithm, sparse regression algorithm, swap, (13 more...)

arXiv.org Machine Learning

1312.1706

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Semi-Supervised Nonlinear Distance Metric Learning via Forests of Max-Margin Cluster Hierarchies

Johnson, David M., Xiong, Caiming, Corso, Jason J.

arXiv.org Machine LearningFeb-22-2014

Metric learning is a key problem for many data mining and machine learning applications, and has long been dominated by Mahalanobis methods. Recent advances in nonlinear metric learning have demonstrated the potential power of non-Mahalanobis distance functions, particularly tree-based functions. We propose a novel nonlinear metric learning method that uses an iterative, hierarchical variant of semi-supervised max-margin clustering to construct a forest of cluster hierarchies, where each individual hierarchy can be interpreted as a weak metric over the data. By introducing randomness during hierarchy training and combining the output of many of the resulting semi-random weak hierarchy metrics, we can obtain a powerful and robust nonlinear metric model. This method has two primary contributions: first, it is semi-supervised, incorporating information from both constrained and unconstrained points. Second, we take a relaxed approach to constraint satisfaction, allowing the method to satisfy different subsets of the constraints at different levels of the hierarchy rather than attempting to simultaneously satisfy all of them. This leads to a more robust learning algorithm. We compare our method to a number of state-of-the-art benchmarks on $k$-nearest neighbor classification, large-scale image retrieval and semi-supervised clustering problems, and find that our algorithm yields results comparable or superior to the state-of-the-art, and is significantly more robust to noise.

artificial intelligence, constraint, machine learning, (17 more...)

arXiv.org Machine Learning

1402.5565

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Scaling Nonparametric Bayesian Inference via Subsample-Annealing

Obermeyer, Fritz, Glidden, Jonathan, Jonas, Eric

arXiv.org Machine LearningFeb-21-2014

We describe an adaptation of the simulated annealing algorithm to nonparametric clustering and related probabilistic models. This new algorithm learns nonparametric latent structure over a growing and constantly churning subsample of training data, where the portion of data subsampled can be interpreted as the inverse temperature beta(t) in an annealing schedule. Gibbs sampling at high temperature (i.e., with a very small subsample) can more quickly explore sketches of the final latent state by (a) making longer jumps around latent space (as in block Gibbs) and (b) lowering energy barriers (as in simulated annealing). We prove subsample annealing speeds up mixing time N^2 -> N in a simple clustering model and exp(N) -> N in another class of models, where N is data size. Empirically subsample-annealing outperforms naive Gibbs sampling in accuracy-per-wallclock time, and can scale to larger datasets and deeper hierarchical models. We demonstrate improved inference on million-row subsamples of US Census data and network log data and a 307-row hospital rating dataset, using a Pitman-Yor generalization of the Cross Categorization model.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Machine Learning

1402.5473

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.34)
Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)

Add feedback