AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Learning Kernels for Structured Prediction using Polynomial Kernel Transformations

arXiv.org Machine LearningJan-7-2016

Learning the kernel functions used in kernel methods has been a vastly explored area in machine learning. It is now widely accepted that to obtain 'good' performance, learning a kernel function is the key challenge. In this work we focus on learning kernel representations for structured regression. We propose use of polynomials expansion of kernels, referred to as Schoenberg transforms and Gegenbaur transforms, which arise from the seminal result of Schoenberg (1938). These kernels can be thought of as polynomial combination of input features in a high dimensional reproducing kernel Hilbert space (RKHS). We learn kernels over input and output for structured data, such that, dependency between kernel features is maximized. We use Hilbert-Schmidt Independence Criterion (HSIC) to measure this. We also give an efficient, matrix decomposition-based algorithm to learn these kernel transformations, and demonstrate state-of-the-art results on several real-world datasets.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Machine Learning

1601.01411

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.74)

Add feedback

Dropout as data augmentation

Bouthillier, Xavier, Konda, Kishore, Vincent, Pascal, Memisevic, Roland

arXiv.org Machine LearningJan-7-2016

Dropout is typically interpreted as bagging a large number of models sharing parameters. We show that using dropout in a network can also be interpreted as a kind of data augmentation in the input space without domain knowledge. We present an approach to projecting the dropout noise within a network back into the input space, thereby generating augmented versions of the training data, and we show that training a deterministic network on the augmented samples yields similar results. Finally, we propose a new dropout noise scheme based on our observations and show that it improves dropout results without adding significant computational cost.

artificial intelligence, dropout, machine learning, (16 more...)

arXiv.org Machine Learning

1506.087

Country:

North America (0.29)
Europe > Germany (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Robust EM kernel-based methods for linear system identification

Bottegal, Giulio, Aravkin, Aleksandr Y., Hjalmarsson, Håkan, Pillonetto, Gianluigi

arXiv.org Machine LearningJan-7-2016

Recent developments in system identification have brought attention to regularized kernel-based methods. This type of approach has been proven to compare favorably with classic parametric methods. However, current formulations are not robust with respect to outliers. In this paper, we introduce a novel method to robustify kernel-based system identification methods. To this end, we model the output measurement noise using random variables with heavy-tailed probability density functions (pdfs), focusing on the Laplacian and the Student's t distributions. Exploiting the representation of these pdfs as scale mixtures of Gaussians, we cast our system identification problem into a Gaussian process regression framework, which requires estimating a number of hyperparameters of the data size order. To overcome this difficulty, we design a new maximum a posteriori (MAP) estimator of the hyperparameters, and solve the related optimization problem with a novel iterative scheme based on the Expectation-Maximization (EM) method. In presence of outliers, tests on simulated data and on a real system show a substantial performance improvement compared to currently used kernel-based methods for linear system identification.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

1411.5915

Country:

Europe (0.93)
North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Probabilistic Programming with Gaussian Process Memoization

Schaechtle, Ulrich, Zinberg, Ben, Radul, Alexey, Stathis, Kostas, Mansinghka, Vikash K.

arXiv.org Machine LearningJan-5-2016

Gaussian Processes (GPs) are widely used tools in statistics, machine learning, robotics, computer vision, and scientific computation. However, despite their popularity, they can be difficult to apply; all but the simplest classification or regression applications require specification and inference over complex covariance functions that do not admit simple analytical posteriors. This paper shows how to embed Gaussian processes in any higher-order probabilistic programming language, using an idiom based on memoization, and demonstrates its utility by implementing and extending classic and state-of-the-art GP applications. The interface to Gaussian processes, called gpmem, takes an arbitrary real-valued computational process as input and returns a statistical emulator that automatically improve as the original process is invoked and its input-output behavior is recorded. The flexibility of gpmem is illustrated via three applications: (i) robust GP regression with hierarchical hyper-parameter learning, (ii) discovering symbolic expressions from time-series data by fully Bayesian structure learning over kernels generated by a stochastic grammar, and (iii) a bandit formulation of Bayesian optimization with automatic inference and action selection. All applications share a single 50-line Python library and require fewer than 20 lines of probabilistic code each.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1512.05665

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Convergence of Stochastic Gradient Descent for PCA

Shamir, Ohad

arXiv.org Machine LearningJan-4-2016

We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i.i.d. data points in $\reals^d$. A simple and computationally cheap algorithm for this is stochastic gradient descent (SGD), which incrementally updates its estimate based on each new data point. However, due to the non-convex nature of the problem, analyzing its performance has been a challenge. In particular, existing guarantees rely on a non-trivial eigengap assumption on the covariance matrix, which is intuitively unnecessary. In this paper, we provide (to the best of our knowledge) the first eigengap-free convergence guarantees for SGD in the context of PCA. This also partially resolves an open problem posed in \cite{hardt2014noisy}. Moreover, under an eigengap assumption, we show that the same techniques lead to new SGD convergence guarantees with better dependence on the eigengap.

artificial intelligence, machine learning, probability, (17 more...)

arXiv.org Machine Learning

1509.09002

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Modelling-based experiment retrieval: A case study with gene expression clustering

Blomstedt, Paul, Dutta, Ritabrata, Seth, Sohan, Brazma, Alvis, Kaski, Samuel

arXiv.org Machine LearningJan-4-2016

Motivation: Public and private repositories of experimental data are growing to sizes that require dedicated methods for finding relevant data. To improve on the state of the art of keyword searches from annotations, methods for content-based retrieval have been proposed. In the context of gene expression experiments, most methods retrieve gene expression profiles, requiring each experiment to be expressed as a single profile, typically of case vs. control. A more general, recently suggested alternative is to retrieve experiments whose models are good for modelling the query dataset. However, for very noisy and high-dimensional query data, this retrieval criterion turns out to be very noisy as well. Results: We propose doing retrieval using a denoised model of the query dataset, instead of the original noisy dataset itself. To this end, we introduce a general probabilistic framework, where each experiment is modelled separately and the retrieval is done by finding related models. For retrieval of gene expression experiments, we use a probabilistic model called product partition model, which induces a clustering of genes that show similar expression patterns across a number of samples. The suggested metric for retrieval using clusterings is the normalized information distance. Empirical results finally suggest that inference for the full probabilistic model can be approximated with good performance using computationally faster heuristic clustering approaches (e.g. $k$-means). The method is highly scalable and straightforward to apply to construct a general-purpose gene expression experiment retrieval method. Availability: The method can be implemented using standard clustering algorithms and normalized information distance, available in many statistical software packages.

artificial intelligence, experiment, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1093/bioinformatics/btv762

1505.05007

Country: Europe (0.46)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Supervised Dimensionality Reduction via Distance Correlation Maximization

Vepakomma, Praneeth, Tonde, Chetan, Elgammal, Ahmed

arXiv.org Machine LearningJan-2-2016

In our work, we propose a novel formulation for supervised dimensionality reduction based on a nonlinear dependency criterion called Statistical Distance Correlation, Szekely et. al. (2007). We propose an objective which is free of distributional assumptions on regression variables and regression model assumptions. Our proposed formulation is based on learning a low-dimensional feature representation $\mathbf{z}$, which maximizes the squared sum of Distance Correlations between low dimensional features $\mathbf{z}$ and response $y$, and also between features $\mathbf{z}$ and covariates $\mathbf{x}$. We propose a novel algorithm to optimize our proposed objective using the Generalized Minimization Maximizaiton method of \Parizi et. al. (2015). We show superior empirical results on multiple datasets proving the effectiveness of our proposed approach over several relevant state-of-the-art supervised dimensionality reduction methods.

data mining, distance correlation, machine learning, (12 more...)

arXiv.org Machine Learning

1601.00236

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.81)

Add feedback

Joint Estimation of Precision Matrices in Heterogeneous Populations

Saegusa, Takumi, Shojaie, Ali

arXiv.org Machine LearningJan-2-2016

We introduce a general framework for estimation of inverse covariance, or precision, matrices from heterogeneous populations. The proposed framework uses a Laplacian shrinkage penalty to encourage similarity among estimates from disparate, but related, subpopulations, while allowing for differences among matrices. We propose an efficient alternating direction method of multipliers (ADMM) algorithm for parameter estimation, as well as its extension for faster computation in high dimensions by thresholding the empirical covariance matrix to identify the joint block diagonal structure in the estimated precision matrices. We establish both variable selection and norm consistency of the proposed estimator for distributions with exponential or polynomial tails. Further, to extend the applicability of the method to the settings with unknown populations structure, we propose a Laplacian penalty based on hierarchical clustering, and discuss conditions under which this data-driven choice results in consistent estimation of precision matrices in heterogenous populations. Extensive numerical studies and applications to gene expression data from subtypes of cancer with distinct clinical outcomes indicate the potential advantages of the proposed method over existing approaches.

artificial intelligence, machine learning, matrix, (18 more...)

arXiv.org Machine Learning

1601.00142

Genre:

Research Report > Experimental Study (0.65)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Adaptive Independent Sticky MCMC algorithms

Martino, L., Casarin, R., Leisen, F., Luengo, D.

arXiv.org Machine LearningJan-2-2016

In this work, we introduce a novel class of adaptive Monte Carlo methods, called adaptive independent sticky MCMC algorithms, for efficient sampling from a generic target probability density function (pdf). The new class of algorithms employs adaptive non-parametric proposal densities which become closer and closer to the target as the number of iterations increases. The proposal pdf is built using interpolation procedures based on a set of support points which is constructed iteratively based on previously drawn samples. The algorithm's efficiency is ensured by a test that controls the evolution of the set of support points. This extra stage controls the computational cost and the convergence of the proposal density to the target. Each part of the novel family of algorithms is discussed and several examples are provided. Although the novel algorithms are presented for univariate target densities, we show that they can be easily extended to the multivariate context within a Gibbs-type sampler. The ergodicity is ensured and discussed. Exhaustive numerical examples illustrate the efficiency of sticky schemes, both as a stand-alone methods to sample from complicated one-dimensional pdfs and within Gibbs in order to draw from multi-dimensional target distributions.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

1308.3779

Country: Europe (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models

Gutmann, Michael U., Corander, Jukka

arXiv.org Machine LearningDec-31-2015

Our paper deals with inferring simulator-based statistical models given some observed data. A simulator-based model is a parametrized mechanism which specifies how data are generated. It is thus also referred to as generative model. We assume that only a finite number of parameters are of interest and allow the generative process to be very general; it may be a noisy nonlinear dynamical system with an unrestricted number of hidden variables. This weak assumption is useful for devising realistic models but it renders statistical inference very difficult. The main challenge is the intractability of the likelihood function. Several likelihood-free inference methods have been proposed which share the basic idea of identifying the parameters by finding values for which the discrepancy between simulated and observed data is small. A major obstacle to using these methods is their computational cost. The cost is largely due to the need to repeatedly simulate data sets and the lack of knowledge about how the parameters affect the discrepancy. We propose a strategy which combines probabilistic modeling of the discrepancy with optimization to facilitate likelihood-free inference. The strategy is implemented using Bayesian optimization and is shown to accelerate the inference through a reduction in the number of required simulations by several orders of magnitude.

bayesian inference, likelihood, survey article, (18 more...)

arXiv.org Machine Learning

1501.03291

Country:

Europe > Finland (0.14)
North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Energy > Oil & Gas (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback