AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Fast projections onto mixed-norm balls with applications

arXiv.org Machine LearningApr-6-2012

Joint sparsity offers powerful structural cues for feature selection, especially for variables that are expected to demonstrate a "grouped" behavior. Such behavior is commonly modeled via group-lasso, multitask lasso, and related methods where feature selection is effected via mixed-norms. Several mixed-norm based sparse models have received substantial attention, and for some cases efficient algorithms are also available. Surprisingly, several constrained sparse models seem to be lacking scalable algorithms. We address this deficiency by presenting batch and online (stochastic-gradient) optimization methods, both of which rely on efficient projections onto mixed-norm balls. We illustrate our methods by applying them to the multitask lasso. We conclude by mentioning some open problems.

artificial intelligence, machine learning, projection, (16 more...)

arXiv.org Machine Learning

1204.1437

Country: Europe > Germany (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Add feedback

Tight Sample Complexity of Large-Margin Learning

Sabato, Sivan, Srebro, Nathan, Tishby, Naftali

arXiv.org Machine LearningApr-5-2012

We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L_2 regularization: We introduce the \gamma-adapted-dimension, which is a simple function of the spectrum of a distribution's covariance matrix, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the \gamma-adapted-dimension of the source distribution. We conclude that this new quantity tightly characterizes the true sample complexity of large-margin classification. The bounds hold for a rich family of sub-Gaussian distributions.

artificial intelligence, machine learning, sample complexity, (19 more...)

arXiv.org Machine Learning

1011.5053

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Learning to relate images: Mapping units, complex cells and simultaneous eigenspaces

Memisevic, Roland

arXiv.org Artificial IntelligenceApr-5-2012

A fundamental operation in many vision tasks, including motion understanding, stereopsis, visual odometry, or invariant recognition, is establishing correspondences between images or between images and data from other modalities. We present an analysis of the role that multiplicative interactions play in learning such correspondences, and we show how learning and inferring relationships between images can be viewed as detecting rotations in the eigenspaces shared among a set of orthogonal matrices. We review a variety of recent multiplicative sparse coding methods in light of this observation. We also review how the squaring operation performed by energy models and by models of complex cells can be thought of as a way to implement multiplicative interactions. This suggests that the main utility of including complex cells in computational models of vision may be that they can encode relations not invariances.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1110.0107

Country:

Europe > United Kingdom > England (0.28)
North America > United States > New York (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
(2 more...)

Add feedback

A General Framework of Dual Certificate Analysis for Structured Sparse Recovery Problems

Zhang, Cun-Hui, Zhang, Tong

arXiv.org Machine LearningApr-4-2012

This paper develops a general theoretical framework to analyze structured sparse recovery problems using the notation of dual certificate. Although certain aspects of the dual certificate idea have already been used in some previous work, due to the lack of a general and coherent theory, the analysis has so far only been carried out in limited scopes for specific problems. In this context the current paper makes two contributions. First, we introduce a general definition of dual certificate, which we then use to develop a unified theory of sparse recovery analysis for convex programming. Second, we present a class of structured sparsity regularization called structured Lasso for which calculations can be readily performed under our theoretical framework. This new theory includes many seemingly loosely related previous work as special cases; it also implies new results that improve existing ones even for standard formulations such as L1 regularization.

artificial intelligence, dual certificate, machine learning, (16 more...)

arXiv.org Machine Learning

1201.3302

Country: North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Relax and Localize: From Value to Algorithms

Rakhlin, Alexander, Shamir, Ohad, Sridharan, Karthik

arXiv.org Machine LearningApr-4-2012

We show a principled way of deriving online learning algorithms from a minimax analysis. Various upper bounds on the minimax value, previously thought to be non-constructive, are shown to yield algorithms. This allows us to seamlessly recover known methods and to derive new ones. Our framework also captures such "unorthodox" methods as Follow the Perturbed Leader and the R^2 forecaster. We emphasize that understanding the inherent complexity of the learning problem leads to the development of algorithms. We define local sequential Rademacher complexities and associated algorithms that allow us to obtain faster rates in online learning, similarly to statistical learning theory. Based on these localized complexities we build a general adaptive method that can take advantage of the suboptimality of the observed sequence. We present a number of new algorithms, including a family of randomized methods that use the idea of a "random playout". Several new versions of the Follow-the-Perturbed-Leader algorithms are presented, as well as methods based on the Littlestone's dimension, efficient methods for matrix completion with trace norm, and algorithms for the problems of transductive learning and prediction with static experts.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1204.087

Genre:

Workflow (0.67)
Research Report (0.50)

Industry: Education (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)

Add feedback

Validation of nonlinear PCA

Scholz, Matthias

arXiv.org Artificial IntelligenceApr-3-2012

Linear principal component analysis (PCA) can be extended to a nonlinear PCA by using artificial neural networks. But the benefit of curved components requires a careful control of the model complexity. Moreover, standard techniques for model selection, including cross-validation and more generally the use of an independent test set, fail when applied to nonlinear PCA because of its inherent unsupervised characteristics. This paper presents a new approach for validating the complexity of nonlinear PCA models by using the error in missing data estimation as a criterion for model selection. It is motivated by the idea that only the model of optimal complexity is able to predict missing values with the highest accuracy. While standard test set validation usually favours over-fitted nonlinear PCA models, the proposed model validation approach correctly selects the optimal model complexity.

complexity, nonlinear pca, validation, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s11063-012-9220-6

1204.0684

Country:

North America > United States > New York (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery

Goodfellow, Ian J., Courville, Aaron, Bengio, Yoshua

arXiv.org Machine LearningApr-3-2012

We consider the problem of using a factor model we call {\em spike-and-slab sparse coding} (S3C) to learn features for a classification task. The S3C model resembles both the spike-and-slab RBM and sparse coding. Since exact inference in this model is intractable, we derive a structured variational inference procedure and employ a variational EM training algorithm. Prior work on approximate inference for this model has not prioritized the ability to exploit parallel architectures and scale to enormous problem sizes. We present an inference procedure appropriate for use with GPUs which allows us to dramatically increase both the training set size and the amount of latent factors. We demonstrate that this approach improves upon the supervised learning capabilities of both sparse coding and the ssRBM on the CIFAR-10 dataset. We evaluate our approach's potential for semi-supervised learning on subsets of CIFAR-10. We demonstrate state-of-the art self-taught learning performance on the STL-10 dataset and use our method to win the NIPS 2011 Workshop on Challenges In Learning Hierarchical Models' Transfer Learning Challenge.

artificial intelligence, machine learning, sparse, (16 more...)

arXiv.org Machine Learning

1201.3382

Country: North America > United States (0.29)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Covering Numbers for Convex Functions

Guntuboyina, Adityanand, Sen, Bodhisattva

arXiv.org Machine LearningMar-31-2012

In this paper we study the covering numbers of the space of convex and uniformly bounded functions in multi-dimension. We find optimal upper and lower bounds for the $\epsilon$-covering number of $\C([a, b]^d, B)$, in the $L_p$-metric, $1 \le p < \infty$, in terms of the relevant constants, where $d \geq 1$, $a < b \in \mathbb{R}$, $B>0$, and $\C([a,b]^d, B)$ denotes the set of all convex functions on $[a, b]^d$ that are uniformly bounded by $B$. We summarize previously known results on covering numbers for convex functions and also provide alternate proofs of some known results. Our results have direct implications in the study of rates of convergence of empirical minimization procedures as well as optimal convergence rates in the numerous convexity constrained function estimation problems.

artificial intelligence, convex function, machine learning, (16 more...)

arXiv.org Machine Learning

1204.0147

Country: North America > United States > California (0.28)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Transforming Graph Representations for Statistical Relational Learning

Rossi, Ryan A., McDowell, Luke K., Aha, David W., Neville, Jennifer

arXiv.org Artificial IntelligenceMar-30-2012

Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation--for the nodes, links, and features--can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed.

machine learning, natural language, node, (18 more...)

arXiv.org Artificial Intelligence

1204.0033

Country:

Europe (0.67)
North America > United States > Massachusetts (0.27)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.46)

Industry:

Information Technology > Services (1.00)
Health & Medicine (0.92)
Telecommunications (0.67)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback

Reproducing Kernel Banach Spaces with the l1 Norm

Song, Guohui, Zhang, Haizhang, Hickernell, Fred J.

arXiv.org Machine LearningMar-28-2012

Targeting at sparse learning, we construct Banach spaces B of functions on an input space X with the properties that (1) B possesses an l1 norm in the sense that it is isometrically isomorphic to the Banach space of integrable functions on X with respect to the counting measure; (2) point evaluations are continuous linear functionals on B and are representable through a bilinear form with a kernel function; (3) regularized learning schemes on B satisfy the linear representer theorem. Examples of kernel functions admissible for the construction of such spaces are given.

artificial intelligence, machine learning, representer theorem, (14 more...)

arXiv.org Machine Learning

doi: 10.1016/j.acha.2012.03.009

1101.4388

Country:

North America > United States (1.00)
Asia > China > Guangdong Province (0.28)

Genre: Research Report (0.50)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback