AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Stability of Multi-Task Kernel Regression Algorithms

arXiv.org Machine LearningJun-17-2013

We study the stability properties of nonlinear multi-task regression in reproducing Hilbert spaces with operator-valued kernels. Such kernels, a.k.a. multi-task kernels, are appropriate for learning prob- lems with nonscalar outputs like multi-task learning and structured out- put prediction. We show that multi-task kernel regression algorithms are uniformly stable in the general case of infinite-dimensional output spaces. We then derive under mild assumption on the kernel generaliza- tion bounds of such algorithms, and we show their consistency even with non Hilbert-Schmidt operator-valued kernels . We demonstrate how to apply the results to various multi-task kernel regression methods such as vector-valued SVR and functional ridge regression.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1306.3905

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Spectral Experts for Estimating Mixtures of Linear Regressions

Chaganty, Arun Tejasvi, Liang, Percy

arXiv.org Machine LearningJun-16-2013

Discriminative latent-variable models are typically learned using EM or gradient-based optimization, which suffer from local optima. In this paper, we develop a new computationally efficient and provably consistent estimator for a mixture of linear regressions, a simple instance of a discriminative latent-variable model. Our approach relies on a low-rank linear regression to recover a symmetric tensor, which can be factorized into the parameters using a tensor power method. We prove rates of convergence for our estimator and provide an empirical evaluation illustrating its strengths relative to local optimization (EM).

artificial intelligence, machine learning, regression, (15 more...)

arXiv.org Machine Learning

1306.3729

Country: North America > United States (0.93)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.92)

Add feedback

Early stopping and non-parametric regression: An optimal data-dependent stopping rule

Raskutti, Garvesh, Wainwright, Martin J., Yu, Bin

arXiv.org Machine LearningJun-15-2013

The strategy of early stopping is a regularization technique based on choosing a stopping time for an iterative algorithm. Focusing on non-parametric regression in a reproducing kernel Hilbert space, we analyze the early stopping strategy for a form of gradient-descent applied to the least-squares loss function. We propose a data-dependent stopping rule that does not involve hold-out or cross-validation data, and we prove upper bounds on the squared error of the resulting function estimate, measured in either the $L^2(P)$ and $L^2(P_n)$ norm. These upper bounds lead to minimax-optimal rates for various kernel classes, including Sobolev smoothness classes and other forms of reproducing kernel Hilbert spaces. We show through simulation that our stopping rule compares favorably to two other stopping rules, one based on hold-out data and the other based on Stein's unbiased risk estimate. We also establish a tight connection between our early stopping strategy and the solution path of a kernel ridge regression estimator.

artificial intelligence, machine learning, regression, (16 more...)

arXiv.org Machine Learning

1306.3574

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Add feedback

Constrained fractional set programs and their application in local clustering and community detection

Bühler, Thomas, Rangapuram, Syama Sundar, Setzer, Simon, Hein, Matthias

arXiv.org Machine LearningJun-14-2013

The (constrained) minimization of a ratio of set functions is a problem frequently occurring in clustering and community detection. As these optimization problems are typically NP-hard, one uses convex or spectral relaxations in practice. While these relaxations can be solved globally optimally, they are often too loose and thus lead to results far away from the optimum. In this paper we show that every constrained minimization problem of a ratio of non-negative set functions allows a tight relaxation into an unconstrained continuous optimization problem. This result leads to a flexible framework for solving constrained problems in network analysis. While a globally optimal solution for the resulting non-convex problem cannot be guaranteed, we outperform the loose convex or spectral relaxations by a large margin on constrained local clustering problems.

artificial intelligence, constraint, machine learning, (17 more...)

arXiv.org Machine Learning

1306.3409

Country:

North America > United States (0.28)
Europe > Germany > Saarland (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Hyperparameter Optimization and Boosting for Classifying Facial Expressions: How good can a "Null" Model be?

Bergstra, James, Cox, David D.

arXiv.org Machine LearningJun-14-2013

One of the goals of the ICML workshop on representation and learning is to establish benchmark scores for a new data set of labeled facial expressions. This paper presents the performance of a "Null model" consisting of convolutions with random weights, PCA, pooling, normalization, and a linear readout. Our approach focused on hyperparameter optimization rather than novel model components. On the Facial Expression Recognition Challenge held by the Kaggle website, our hyperparameter optimization approach achieved a score of 60% accuracy on the test data. This paper also introduces a new ensemble construction variant that combines hyperparameter optimization with the construction of ensembles. This algorithm constructed an ensemble of four models that scored 65.5% accuracy. These scores rank 12th and 5th respectively among the 56 challenge participants. It is worth noting that our approach was developed prior to the release of the data set, and applied without modification; our strong competition performance suggests that the TPE hyperparameter optimization algorithm and domain expertise encoded in our Null model can generalize to new image classification data sets.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1306.3476

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Classifying Single-Trial EEG during Motor Imagery with a Small Training Set

Wang, Yijun

arXiv.org Machine LearningJun-14-2013

Before the operation of a motor imagery based brain-computer interface (BCI) adopting machine learning techniques, a cumbersome training procedure is unavoidable. The development of a practical BCI posed the challenge of classifying single-trial EEG with a small training set. In this letter, we addressed this problem by employing a series of signal processing and machine learning approaches to alleviate overfitting and obtained test accuracy similar to training accuracy on the datasets from BCI Competition III and our own experiments.

artificial intelligence, classification, machine learning, (16 more...)

arXiv.org Machine Learning

1306.3474

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.49)

Add feedback

Physeter catodon localization by sparse coding

Paris, Sébastien, Doh, Yann, Glotin, Hervé, Halkias, Xanadu, Razik, Joseph

arXiv.org Machine LearningJun-13-2013

This paper presents a spermwhale' localization architecture using jointly a bag-of-features (BoF) approach and machine learning framework. BoF methods are known, especially in computer vision, to produce from a collection of local features a global representation invariant to principal signal transformations. Our idea is to regress supervisely from these local features two rough estimates of the distance and azimuth thanks to some datasets where both acoustic events and ground-truth position are now available. Furthermore, these estimates can feed a particle filter system in order to obtain a precise spermwhale' position even in mono-hydrophone configuration. Anti-collision system and whale watching are considered applications of this work.

artificial intelligence, machine learning, physeter catodon localization, (15 more...)

arXiv.org Machine Learning

1306.3058

Country: North America > United States (0.14)

Genre: Research Report (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation

Hsieh, Cho-Jui, Sustik, Matyas A., Dhillon, Inderjit S., Ravikumar, Pradeep

arXiv.org Machine LearningJun-13-2013

The L1-regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algorithm for solving the resulting optimization problem which is a regularized log-determinant program. In contrast to recent state-of-the-art methods that largely use first order gradient information, our algorithm is based on Newton's method and employs a quadratic approximation, but with some modifications that leverage the structure of the sparse Gaussian MLE problem. We show that our method is superlinearly convergent, and present experimental results using synthetic and real-world application data that demonstrate the considerable improvements in performance of our method when compared to other state-of-the-art methods.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1306.3212

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
(3 more...)

Add feedback

A Linear Approximation to the chi^2 Kernel with Geometric Convergence

Li, Fuxin, Lebanon, Guy, Sminchisescu, Cristian

arXiv.org Machine LearningJun-12-2013

We propose a new analytical approximation to the $\chi^2$ kernel that converges geometrically. The analytical approximation is derived with elementary methods and adapts to the input distribution for optimal convergence rate. Experiments show the new approximation leads to improved performance in image classification and semantic segmentation tasks using a random Fourier feature approximation of the $\exp-\chi^2$ kernel. Besides, out-of-core principal component analysis (PCA) methods are introduced to reduce the dimensionality of the approximation and achieve better performance at the expense of only an additional constant factor to the time complexity. Moreover, when PCA is performed jointly on the training and unlabeled testing data, further performance improvements can be obtained. Experiments conducted on the PASCAL VOC 2010 segmentation and the ImageNet ILSVRC 2010 datasets show statistically significant improvements over alternative approximation methods.

artificial intelligence, kernel, machine learning, (12 more...)

arXiv.org Machine Learning

1206.4074

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Non-parametric Power-law Data Clustering

Fan, Xuhui, Zeng, Yiling, Cao, Longbing

arXiv.org Machine LearningJun-12-2013

It has always been a great challenge for clustering algorithms to automatically determine the cluster numbers according to the distribution of datasets. Several approaches have been proposed to address this issue, including the recent promising work which incorporate Bayesian Nonparametrics into the $k$-means clustering procedure. This approach shows simplicity in implementation and solidity in theory, while it also provides a feasible way to inference in large scale datasets. However, several problems remains unsolved in this pioneering work, including the power-law data applicability, mechanism to merge centers to avoid the over-fitting problem, clustering order problem, e.t.c.. To address these issues, the Pitman-Yor Process based k-means (namely \emph{pyp-means}) is proposed in this paper. Taking advantage of the Pitman-Yor Process, \emph{pyp-means} treats clusters differently by dynamically and adaptively changing the threshold to guarantee the generation of power-law clustering results. Also, one center agglomeration procedure is integrated into the implementation to be able to merge small but close clusters and then adaptively determine the cluster number. With more discussion on the clustering order, the convergence proof, complexity analysis and extension to spectral clustering, our approach is compared with traditional clustering algorithm and variational inference methods. The advantages and properties of pyp-means are validated by experiments on both synthetic datasets and real world datasets.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Machine Learning

1306.3003

Genre: Research Report > New Finding (0.46)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback