AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Optimization with Sparsity-Inducing Penalties

Bach, Francis, Jenatton, Rodolphe, Mairal, Julien, Obozinski, Guillaume

arXiv.org Machine LearningNov-22-2011

Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted $\ell_2$-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

1108.0775

Country: North America > United States > California (0.27)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization

Agarwal, Alekh, Bartlett, Peter L., Ravikumar, Pradeep, Wainwright, Martin J.

arXiv.org Machine LearningNov-20-2011

Relative to the large literature on upper bounds on complexity of convex optimization, lesser attention has been paid to the fundamental hardness of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining an understanding of these complexity-theoretic issues is important. In this paper, we study the complexity of stochastic convex optimization in an oracle model of computation. We improve upon known results and obtain tight minimax complexity estimates for various function classes.

artificial intelligence, machine learning, optimization, (16 more...)

arXiv.org Machine Learning

1009.0571

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

Joint Modeling of Multiple Related Time Series via the Beta Process

Fox, Emily B., Sudderth, Erik B., Jordan, Michael I., Willsky, Alan S.

arXiv.org Machine LearningNov-17-2011

We propose a Bayesian nonparametric approach to the problem of jointly modeling multiple related time series. Our approach is based on the discovery of a set of latent, shared dynamical behaviors. Using a beta process prior, the size of the set and the sharing pattern are both inferred from data. We develop efficient Markov chain Monte Carlo methods based on the Indian buffet process representation of the predictive distribution of the beta process, without relying on a truncated model. In particular, our approach uses the sum-product algorithm to efficiently compute Metropolis-Hastings acceptance probabilities, and explores new dynamical behaviors via birth and death proposals. We examine the benefits of our proposed feature-based model on several synthetic datasets, and also demonstrate promising results on unsupervised segmentation of visual motion capture data.

bayesian inference, health & medicine, time series, (17 more...)

arXiv.org Machine Learning

1111.4226

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.63)

Industry:

Health & Medicine (0.93)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)

Add feedback

Fast Learning Rate of Non-Sparse Multiple Kernel Learning and Optimal Regularization Strategies

Suzuki, Taiji

arXiv.org Machine LearningNov-16-2011

In this paper, we give a new generalization error bound of Multiple Kernel Learning (MKL) for a general class of regularizations, and discuss what kind of regularization gives a favorable predictive accuracy. Our main target in this paper is dense type regularizations including \ellp-MKL. According to the recent numerical experiments, the sparse regularization does not necessarily show a good performance compared with dense type regularizations. Motivated by this fact, this paper gives a general theoretical tool to derive fast learning rates of MKL that is applicable to arbitrary mixed-norm-type regularizations in a unifying manner. This enables us to compare the generalization performances of various types of regularizations. As a consequence, we observe that the homogeneity of the complexities of candidate reproducing kernel Hilbert spaces (RKHSs) affects which regularization strategy (\ell1 or dense) is preferred. In fact, in homogeneous complexity settings where the complexities of all RKHSs are evenly same, \ell1-regularization is optimal among all isotropic norms. On the other hand, in inhomogeneous complexity settings, dense type regularizations can show better learning rate than sparse \ell1-regularization. We also show that our learning rate achieves the minimax lower bound in homogeneous complexity settings.

artificial intelligence, machine learning, regularization, (16 more...)

arXiv.org Machine Learning

1111.3781

Country: North America > Canada (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Estimated VC dimension for risk bounds

McDonald, Daniel J., Shalizi, Cosma Rohilla, Schervish, Mark

arXiv.org Machine LearningNov-14-2011

Vapnik-Chervonenkis (VC) dimension is a fundamental measure of the generalization capacity of learning algorithms. However, apart from a few special cases, it is hard or impossible to calculate analytically. Vapnik et al. [10] proposed a technique for estimating the VC dimension empirically. While their approach behaves well in simulations, it could not be used to bound the generalization risk of classifiers, because there were no bounds for the estimation error of the VC dimension itself. We rectify this omission, providing high probability concentration results for the proposed estimator and deriving corresponding generalization bounds.

artificial intelligence, machine learning, vc dimension, (17 more...)

arXiv.org Machine Learning

1111.3404

Country: North America > United States (0.15)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

UPAL: Unbiased Pool Based Active Learning

Ganti, Ravi, Gray, Alexander

arXiv.org Artificial IntelligenceNov-13-2011

In this paper we address the problem of pool based active learning, and provide an algorithm, called UPAL, that works by minimizing the unbiased estimator of the risk of a hypothesis in a given hypothesis space. For the space of linear classifiers and the squared loss we show that UPAL is equivalent to an exponentially weighted average forecaster. Exploiting some recent results regarding the spectra of random matrices allows us to establish consistency of UPAL when the true hypothesis is a linear hypothesis. Empirical comparison with an active learner implementation in Vowpal Wabbit, and a previously proposed pool based active learner implementation show good empirical performance and better scalability.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1111.1784

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

An Iterative Algorithm for Fitting Nonconvex Penalized Generalized Linear Models with Grouped Predictors

She, Yiyuan

arXiv.org Machine LearningNov-10-2011

High-dimensional data pose challenges in statistical learning and modeling. Sometimes the predictors can be naturally grouped where pursuing the between-group sparsity is desired. Collinearity may occur in real-world high-dimensional applications where the popular $l_1$ technique suffers from both selection inconsistency and prediction inaccuracy. Moreover, the problems of interest often go beyond Gaussian models. To meet these challenges, nonconvex penalized generalized linear models with grouped predictors are investigated and a simple-to-implement algorithm is proposed for computation. A rigorous theoretical result guarantees its convergence and provides tight preliminary scaling. This framework allows for grouped predictors and nonconvex penalties, including the discrete $l_0$ and the `$l_0+l_2$' type penalties. Penalty design and parameter tuning for nonconvex penalties are examined. Applications of super-resolution spectrum estimation in signal processing and cancer classification with joint gene selection in bioinformatics show the performance improvement by nonconvex penalized estimation.

artificial intelligence, machine learning, penalty, (17 more...)

arXiv.org Machine Learning

0911.5460

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.46)
Health & Medicine > Therapeutic Area > Hematology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Statistical Topic Models for Multi-Label Document Classification

Rubin, Timothy N., Chambers, America, Smyth, Padhraic, Steyvers, Mark

arXiv.org Machine LearningNov-9-2011

Machine learning approaches to multi-label document classification have to date largely relied on discriminative modeling techniques such as support vector machines. A drawback of these approaches is that performance rapidly drops off as the total number of labels and the number of labels per document increase. This problem is amplified when the label frequencies exhibit the type of highly skewed distributions that are often observed in real-world datasets. In this paper we investigate a class of generative statistical topic models for multi-label documents that associate individual word tokens with different labels. We investigate the advantages of this approach relative to discriminative models, particularly with respect to classification problems involving large numbers of relatively rare labels. We compare the performance of generative and discriminative approaches on document labeling tasks ranging from datasets with several thousand labels to datasets with tens of labels. The experimental results indicate that probabilistic generative models can achieve competitive multi-label classification performance compared to discriminative methods, and have advantages for datasets with many labels and skewed label frequencies.

artificial intelligence, machine learning, natural language, (23 more...)

arXiv.org Machine Learning

1107.2462

Country:

Asia (0.67)
North America > United States > California (0.28)

Genre:

Research Report (1.00)
Overview (0.92)

Industry:

Law (1.00)
Government > Regional Government (0.45)
Government > Military (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
(3 more...)

Add feedback

The theory and application of penalized methods or Reproducing Kernel Hilbert Spaces made easy

Heckman, Nancy

arXiv.org Machine LearningNov-8-2011

The popular cubic smoothing spline estimate of a regression function arises as the minimizer of the penalized sum of squares $\sum_j(Y_j - {\mu}(t_j))^2 + {\lambda}\int_a^b [{\mu}"(t)]^2 dt$, where the data are $t_j,Y_j$, $j=1,..., n$. The minimization is taken over an infinite-dimensional function space, the space of all functions with square integrable second derivatives. But the calculations can be carried out in a finite-dimensional space. The reduction from minimizing over an infinite dimensional space to minimizing over a finite dimensional space occurs for more general objective functions: the data may be related to the function ${\mu}$ in another way, the sum of squares may be replaced by a more suitable expression, or the penalty, $\int_a^b [{\mu}"(t)]^2 dt$, might take a different form. This paper reviews the Reproducing Kernel Hilbert Space structure that provides a finite-dimensional solution for a general minimization problem. Particular attention is paid to penalties based on linear differential operators. In this case, one can sometimes easily calculate the minimizer explicitly, using Green's functions.

artificial intelligence, machine learning, reproducing kernel hilbert space, (12 more...)

arXiv.org Machine Learning

1111.1915

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.96)
Health & Medicine > Therapeutic Area > Immunology (0.96)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

On the stability of bootstrap estimators

Christmann, Andreas, Salibian-Barrera, Matias, Van Aelst, Stefan

arXiv.org Machine LearningNov-8-2011

It is shown that bootstrap approximations of an estimator which is based on a continuous operator from the set of Borel probability measures defined on a compact metric space into a complete separable metric space is stable in the sense of qualitative robustness. Support vector machines based on shifted loss functions are treated as special cases.

artificial intelligence, machine learning, metric space, (17 more...)

arXiv.org Machine Learning

1111.1876

Country:

Europe (0.47)
North America > Canada (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.56)

Add feedback