AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Convergence rate of Bayesian tensor estimator: Optimal rate without restricted strong convexity

arXiv.org Machine LearningAug-13-2014

In this paper, we investigate the statistical convergence rate of a Bayesian low-rank tensor estimator. Our problem setting is the regression problem where a tensor structure underlying the data is estimated. This problem setting occurs in many practical applications, such as collaborative filtering, multi-task learning, and spatio-temporal data analysis. The convergence rate is analyzed in terms of both in-sample and out-of-sample predictive accuracies. It is shown that a near optimal rate is achieved without any strong convexity of the observation. Moreover, we show that the method has adaptivity to the unknown rank of the true tensor, that is, the near optimal rate depending on the true rank is achieved even if it is not known a priori.

artificial intelligence, machine learning, tensor, (19 more...)

arXiv.org Machine Learning

1408.3092

Country: Asia > Japan (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Fastfood: Approximate Kernel Expansions in Loglinear Time

Le, Quoc Viet, Sarlos, Tamas, Smola, Alexander Johannes

arXiv.org Machine LearningAug-13-2014

Despite their successes, what makes kernel methods difficult to use in many large scale problems is the fact that storing and computing the decision function is typically expensive, especially at prediction time. In this paper, we overcome this difficulty by proposing Fastfood, an approximation that accelerates such computation significantly. Key to Fastfood is the observation that Hadamard matrices, when combined with diagonal Gaussian matrices, exhibit properties similar to dense Gaussian random matrices. Yet unlike the latter, Hadamard and diagonal matrices are inexpensive to multiply and store. These two matrices can be used in lieu of Gaussian matrices in Random Kitchen Sinks proposed by Rahimi and Recht (2009) and thereby speeding up the computation for a large range of kernel functions. Specifically, Fastfood requires O(n log d) time and O(n) storage to compute n non-linear basis functions in d dimensions, a significant improvement from O(nd) computation and storage, without sacrificing accuracy. Our method applies to any translation invariant and any dot-product kernel, such as the popular RBF kernels and polynomial kernels. We prove that the approximation is unbiased and has low variance. Experiments show that we achieve similar accuracy to full kernel expansions and Random Kitchen Sinks while being 100x faster and using 1000x less memory. These improvements, especially in terms of memory usage, make kernel methods more practical for applications that have large training sets and/or require real-time prediction.

artificial intelligence, kernel, machine learning, (17 more...)

arXiv.org Machine Learning

1408.306

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Kernel Methods (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Marginal Likelihoods for Distributed Parameter Estimation of Gaussian Graphical Models

Meng, Zhaoshi, Wei, Dennis, Wiesel, Ami, Hero, Alfred O. III

arXiv.org Machine LearningAug-13-2014

We consider distributed estimation of the inverse covariance matrix, also called the concentration or precision matrix, in Gaussian graphical models. Traditional centralized estimation often requires global inference of the covariance matrix, which can be computationally intensive in large dimensions. Approximate inference based on message-passing algorithms, on the other hand, can lead to unstable and biased estimation in loopy graphical models. In this paper, we propose a general framework for distributed estimation based on a maximum marginal likelihood (MML) approach. This approach computes local parameter estimates by maximizing marginal likelihoods defined with respect to data collected from local neighborhoods. Due to the non-convexity of the MML problem, we introduce and solve a convex relaxation. The local estimates are then combined into a global estimate without the need for iterative message-passing between neighborhoods. The proposed algorithm is naturally parallelizable and computationally efficient, thereby making it suitable for high-dimensional problems. In the classical regime where the number of variables $p$ is fixed and the number of samples $T$ increases to infinity, the proposed estimator is shown to be asymptotically consistent and to improve monotonically as the local neighborhood size increases. In the high-dimensional scaling regime where both $p$ and $T$ increase to infinity, the convergence rate to the true parameters is derived and is seen to be comparable to centralized maximum likelihood estimation. Extensive numerical experiments demonstrate the improved performance of the two-hop version of the proposed estimator, which suffices to almost close the gap to the centralized maximum likelihood estimator at a reduced computational cost.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2014.2350956

1303.4756

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.74)

Add feedback

Cluster based RBF Kernel for Support Vector Machines

Czarnecki, Wojciech Marian, Tabor, Jacek

arXiv.org Machine LearningAug-12-2014

In the classical Gaussian SVM classification we use the feature space projection transforming points to normal distributions with fixed covariance matrices (identity in the standard RBF and the covariance of the whole dataset in Mahalanobis RBF). In this paper we add additional information to Gaussian SVM by considering local geometry-dependent feature space projection. We emphasize that our approach is in fact an algorithm for a construction of the new Gaussian-type kernel. We show that better (compared to standard RBF and Mahalanobis RBF) classification results are obtained in the simple case when the space is preliminary divided by k-means into two sets and points are represented as normal distributions with a covariances calculated according to the dataset partitioning. We call the constructed method C$_k$RBF, where $k$ stands for the amount of clusters used in k-means. We show empirically on nine datasets from UCI repository that C$_2$RBF increases the stability of the grid search (measured as the probability of finding good parameters).

artificial intelligence, kernel, machine learning, (16 more...)

arXiv.org Machine Learning

1408.2869

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.87)

Add feedback

Generalization and Robustness of Batched Weighted Average Algorithm with V-geometrically Ergodic Markov Data

Cuong, Nguyen Viet, Ho, Lam Si Tung, Dinh, Vu

arXiv.org Machine LearningAug-12-2014

We analyze the generalization and robustness of the batched weighted average algorithm for V-geometrically ergodic Markov data. This algorithm is a good alternative to the empirical risk minimization algorithm when the latter suffers from overfitting or when optimizing the empirical risk is hard. For the generalization of the algorithm, we prove a PAC-style bound on the training sample size for the expected $L_1$-loss to converge to the optimal loss when training data are V-geometrically ergodic Markov chains. For the robustness, we show that if the training target variable's values contain bounded noise, then the generalization bound of the algorithm deviates at most by the range of the noise. Our results can be applied to the regression problem, the classification problem, and the case where there exists an unknown deterministic target hypothesis.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1406.3166

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.41)

Add feedback

Robust computation of linear models by convex relaxation

Lerman, Gilad, McCoy, Michael, Tropp, Joel A., Zhang, Teng

arXiv.org Machine LearningAug-11-2014

Consider a dataset of vector-valued observations that consists of noisy inliers, which are explained well by a low-dimensional subspace, along with some number of outliers. This work describes a convex optimization problem, called REAPER, that can reliably fit a low-dimensional model to this type of data. This approach parameterizes linear subspaces using orthogonal projectors, and it uses a relaxation of the set of orthogonal projectors to reach the convex formulation. The paper provides an efficient algorithm for solving the REAPER problem, and it documents numerical experiments which confirm that REAPER can dependably find linear structure in synthetic and natural data. In addition, when the inliers lie near a low-dimensional subspace, there is a rigorous theory that describes when REAPER can approximate this subspace.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

doi: 10.1007/s10208-014-9221-0

1202.4044

Country:

North America > United States > Minnesota (0.28)
North America > United States > California (0.28)

Genre: Research Report (0.63)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Exponentiated Gradient Exploration for Active Learning

Bouneffouf, Djallel

arXiv.org Artificial IntelligenceAug-10-2014

Active learning strategies respond to the costly labelling task in a supervised classification by selecting the most useful unlabelled examples in training a predictive model. Many conventional active learning algorithms focus on refining the decision boundary, rather than exploring new regions that can be more informative. In this setting, we propose a sequential algorithm named EG Active that can improve any Active learning algorithm by an optimal random exploration. Experimental results show a statistically significant and appreciable improvement in the performance of our new approach over the existing active feedback methods.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

1408.2196

Country: North America > United States > New York (0.15)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

Add feedback

Gaussian Process Structural Equation Models with Latent Variables

Silva, Ricardo, Gramacy, Robert B.

arXiv.org Machine LearningAug-9-2014

In a variety of disciplines such as social sciences, psychology, medicine and economics, the recorded data are considered to be noisy measurements of latent variables connected by some causal structure. This corresponds to a family of graphical models known as the structural equation model with latent variables. While linear non-Gaussian variants have been well-studied, inference in nonparametric structural equation models is still underdeveloped. We introduce a sparse Gaussian process parameterization that defines a non-linear structure connecting latent variables, unlike common formulations of Gaussian process latent variable models. The sparse parameterization is given a full Bayesian treatment without compromising Markov chain Monte Carlo efficiency. We compare the stability of the sampling procedure and the predictive ability of the model against the current practice.

indicator, latent variable, structural equation model, (11 more...)

arXiv.org Machine Learning

1408.2042

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
South America > Paraguay > Asunción > Asunción (0.05)
Europe > Greece (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Conditional Probability Tree Estimation Analysis and Algorithms

Beygelzimer, Alina, Langford, John, Lifshits, Yuri, Sorkin, Gregory, Strehl, Alexander L.

arXiv.org Machine LearningAug-9-2014

We consider the problem of estimating the conditional probability of a label in time O(log n), where n is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which provably constructs a logarithmic depth tree on the set of labels to solve this problem. We test the algorithm empirically, showing that it works succesfully on a dataset with roughly 106 labels.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Machine Learning

1408.2031

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

Guess Who Rated This Movie: Identifying Users Through Subspace Clustering

Zhang, Amy, Fawaz, Nadia, Ioannidis, Stratis, Montanari, Andrea

arXiv.org Machine LearningAug-9-2014

It is often the case that, within an online recommender system, multiple users share a common account. Can such shared accounts be identified solely on the basis of the userprovided ratings? Once a shared account is identified, can the different users sharing it be identified as well? Whenever such user identification is feasible, it opens the way to possible improvements in personalized recommendations, but also raises privacy concerns. We develop a model for composite accounts based on unions of linear subspaces, and use subspace clustering for carrying out the identification task. We show that a significant fraction of such accounts is identifiable in a reliable manner, and illustrate potential uses for personalized recommendation.

artificial intelligence, machine learning, movie, (17 more...)

arXiv.org Machine Learning

1408.2055

Country:

North America > United States > Massachusetts (0.28)
North America > United States > California > Santa Clara County (0.28)

Genre: Research Report (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback