AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Online Robust Subspace Tracking from Partial Information

He, Jun, Balzano, Laura, Lui, John C. S.

arXiv.org Machine LearningSep-20-2011

This paper presents GRASTA (Grassmannian Robust Adaptive Subspace Tracking Algorithm), an efficient and robust online algorithm for tracking subspaces from highly incomplete information. The algorithm uses a robust $l^1$-norm cost function in order to estimate and track non-stationary subspaces when the streaming data vectors are corrupted with outliers. We apply GRASTA to the problems of robust matrix completion and real-time separation of background from foreground in video. In this second application, we show that GRASTA performs high-quality separation of moving objects from background at exceptional speeds: In one popular benchmark video example, GRASTA achieves a rate of 57 frames per second, even when run in MATLAB on a personal laptop.

artificial intelligence, machine learning, subspace, (18 more...)

arXiv.org Machine Learning

1109.3827

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report (1.00)

Industry: Media (0.47)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Learning Discriminative Metrics via Generative Models and Kernel Learning

Shi, Yuan, Noh, Yung-Kyun, Sha, Fei, Lee, Daniel D.

arXiv.org Machine LearningSep-19-2011

Metrics specifying distances between data points can be learned in a discriminative manner or from generative models. In this paper, we show how to unify generative and discriminative learning of metrics via a kernel learning framework. Specifically, we learn local metrics optimized from parametric generative models. These are then used as base kernels to construct a global kernel that minimizes a discriminative training criterion. We consider both linear and nonlinear combinations of local metric kernels. Our empirical results show that these combinations significantly improve performance on classification tasks. The proposed learning algorithm is also very efficient, achieving order of magnitude speedup in training time compared to previous discriminative baseline methods.

dataset, kernel, metric, (16 more...)

arXiv.org Machine Learning

1109.394

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

Add feedback

Differentially Private Online Learning

Jain, Prateek, Kothari, Pravesh, Thakurta, Abhradeep

arXiv.org Machine LearningSep-16-2011

In this paper, we consider the problem of preserving privacy in the online learning setting. We study the problem in the online convex programming (OCP) framework---a popular online learning setting with several interesting theoretical and practical implications---while using differential privacy as the formal privacy measure. For this problem, we distill two critical attributes that a private OCP algorithm should have in order to provide reasonable privacy as well as utility guarantees: 1) linearly decreasing sensitivity, i.e., as new data points arrive their effect on the learning model decreases, 2) sub-linear regret bound---regret bound is a popular goodness/utility measure of an online learning algorithm. Given an OCP algorithm that satisfies these two conditions, we provide a general framework to convert the given algorithm into a privacy preserving OCP algorithm with good (sub-linear) regret. We then illustrate our approach by converting two popular online learning algorithms into their differentially private variants while guaranteeing sub-linear regret ($O(\sqrt{T})$). Next, we consider the special case of online linear regression problems, a practically important class of online learning problems, for which we generalize an approach by Dwork et al. to provide a differentially private algorithm with just $O(\log^{1.5} T)$ regret. Finally, we show that our online learning framework can be used to provide differentially private algorithms for offline learning as well. For the offline learning problem, our approach obtains better error bounds as well as can handle larger class of problems than the existing state-of-the-art methods Chaudhuri et al.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1109.0105

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Convex and Network Flow Optimization for Structured Sparsity

Mairal, Julien, Jenatton, Rodolphe, Obozinski, Guillaume, Bach, Francis

arXiv.org Machine LearningSep-16-2011

We consider a class of learning problems regularized by a structured sparsity-inducing norm defined as the sum of l_2- or l_infinity-norms over groups of variables. Whereas much effort has been put in developing fast optimization techniques when the groups are disjoint or embedded in a hierarchy, we address here the case of general overlapping groups. To this end, we present two different strategies: On the one hand, we show that the proximal operator associated with a sum of l_infinity-norms can be computed exactly in polynomial time by solving a quadratic min-cost flow problem, allowing the use of accelerated proximal gradient methods. On the other hand, we use proximal splitting techniques, and address an equivalent formulation with non-overlapping groups, but in higher dimension and with additional constraints. We propose efficient and scalable algorithms exploiting these two strategies, which are significantly faster than alternative approaches. We illustrate these methods with several problems such as CUR matrix factorization, multi-task learning of tree-structured dictionaries, background subtraction in video sequences, image denoising with wavelets, and topographic dictionary learning of natural image patches.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

1104.1872

Country: North America > United States > California (0.27)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.45)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Beta processes, stick-breaking, and power laws

Broderick, Tamara, Jordan, Michael I., Pitman, Jim

arXiv.org Machine LearningSep-15-2011

The beta-Bernoulli process provides a Bayesian nonparametric prior for models involving collections of binary-valued features. A draw from the beta process yields an infinite collection of probabilities in the unit interval, and a draw from the Bernoulli process turns these into binary-valued features. Recent work has provided stick-breaking representations for the beta process analogous to the well-known stick-breaking representation for the Dirichlet process. We derive one such stick-breaking representation directly from the characterization of the beta process as a completely random measure. This approach motivates a three-parameter generalization of the beta process, and we study the power laws that can be obtained from this generalized beta process. We present a posterior inference algorithm for the beta-Bernoulli process that exploits the stick-breaking representation, and we present experimental results for a discrete factor-analysis model.

artificial intelligence, beta process, machine learning, (17 more...)

arXiv.org Machine Learning

1106.0539

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Reconstruction of sequential data with density models

Carreira-Perpiñán, Miguel Á.

arXiv.org Machine LearningSep-14-2011

We introduce the problem of reconstructing a sequence of multidimensional real vectors where some of the data are missing. This problem contains regression and mapping inversion as particular cases where the pattern of missing data is independent of the sequence index. The problem is hard because it involves possibly multivalued mappings at each vector in the sequence, where the missing variables can take more than one value given the present variables; and the set of missing variables can vary from one vector to the next. To solve this problem, we propose an algorithm based on two redundancy assumptions: vector redundancy (the data live in a low-dimensional manifold), so that the present variables constrain the missing ones; and sequence redundancy (e.g. continuity), so that consecutive vectors constrain each other. We capture the low-dimensional nature of the data in a probabilistic way with a joint density model, here the generative topographic mapping, which results in a Gaussian mixture. Candidate reconstructions at each vector are obtained as all the modes of the conditional distribution of missing variables given present variables. The reconstructed sequence is obtained by minimising a global constraint, here the sequence length, by dynamic programming. We present experimental results for a toy problem and for inverse kinematics of a robot arm.

artificial intelligence, machine learning, reconstruction, (19 more...)

arXiv.org Machine Learning

1109.3248

Country:

North America > United States (0.93)
Europe > United Kingdom > England (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

Data-driven calibration of linear estimators with minimal penalties

Arlot, Sylvain, Bach, Francis

arXiv.org Machine LearningSep-13-2011

This paper tackles the problem of selecting among several linear estimators in non-parametric regression; this includes model selection for linear regression, the choice of a regularization parameter in kernel ridge regression, spline smoothing or locally weighted regression, and the choice of a kernel in multiple kernel learning. We propose a new algorithm which first estimates consistently the variance of the noise, based upon the concept of minimal penalty, which was previously introduced in the context of model selection. Then, plugging our variance estimate in Mallows' $C_L$ penalty is proved to lead to an algorithm satisfying an oracle inequality. Simulation experiments with kernel ridge regression and multiple kernel learning show that the proposed algorithm often improves significantly existing calibration procedures such as generalized cross-validation.

artificial intelligence, assumption, machine learning, (18 more...)

arXiv.org Machine Learning

0909.1884

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Modern hierarchical, agglomerative clustering algorithms

Müllner, Daniel

arXiv.org Machine LearningSep-12-2011

This paper presents algorithms for hierarchical, agglomerative clustering which perform most efficiently in the general-purpose setup that is given in modern standard software. Requirements are: (1) the input data is given by pairwise dissimilarities between data points, but extensions to vector data are also discussed (2) the output is a "stepwise dendrogram", a data structure which is shared by all implementations in current standard software. We present algorithms (old and new) which perform clustering in this setting efficiently, both in an asymptotic worst-case analysis and from a practical point of view. The main contributions of this paper are: (1) We present a new algorithm which is suitable for any distance update scheme and performs significantly better than the existing algorithms. (2) We prove the correctness of two algorithms by Rohlf and Murtagh, which is necessary in each case for different reasons. (3) We give well-founded recommendations for the best current algorithms for the various agglomerative clustering schemes.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1109.2378

Country:

Europe (0.92)
North America > United States > California (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Characterization and exploitation of community structure in cover song networks

Serrà, Joan, Zanin, Massimiliano, Herrera, Perfecto, Serra, Xavier

arXiv.org Machine LearningSep-12-2011

The use of community detection algorithms is explored within the framework of cover song identification, i.e. the automatic detection of different audio renditions of the same underlying musical piece. Until now, this task has been posed as a typical query-by-example task, where one submits a query song and the system retrieves a list of possible matches ranked by their similarity to the query. In this work, we propose a new approach which uses song communities to provide more relevant answers to a given query. Starting from the output of a state-of-the-art system, songs are embedded in a complex weighted network whose links represent similarity (related musical content). Communities inside the network are then recognized as groups of covers and this information is used to enhance the results of the system. In particular, we show that this approach increases both the coherence and the accuracy of the system. Furthermore, we provide insight into the internal organization of individual cover song communities, showing that there is a tendency for the original song to be central within the community. We postulate that the methods and results presented here could be relevant to other query-by-example tasks.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1016/j.patrec.2012.02.013

1108.6003

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.46)
Europe > Spain (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning

Rupp, Matthias, Tkatchenko, Alexandre, Müller, Klaus-Robert, von Lilienfeld, O. Anatole

arXiv.org Machine LearningSep-12-2011

Cross-validation on 7165 molecules yields a mean absolute error of 9.9 kcal/mol, which is an order of magnitude more accurate than counting bonds or semiempirical quantum chemistry. We use the GDB data base, a library of nearly one billion organic molecules that are stable and synthetically accessible according to organic chemistry rules [15]. While potentially applicable to any stoichiometry, as a proof of principle we restrict ourselves to small organic molecules. Specifically, we define a controlled test-bed consisting of all 7165 organic molecules from the GDB data base with up to seven "heavy" atoms that contain C, N, O, or S, being saturated with hydrogen atoms. Atomization energies range from -800 to -2000 kcal/mol.

artificial intelligence, machine learning, molecule, (13 more...)

arXiv.org Machine Learning

doi: 10.1103/PhysRevLett.108.058301

1109.2618

Country: North America > United States > California > Los Angeles County > Los Angeles (0.29)

Genre: Research Report (0.82)

Industry:

Energy (0.47)
Materials > Chemicals (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback