AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Provable Tensor Factorization with Missing Data

arXiv.org Machine LearningJun-11-2014

We study the problem of low-rank tensor factorization in the presence of missing data. We ask the following question: how many sampled entries do we need, to efficiently and exactly reconstruct a tensor with a low-rank orthogonal decomposition? We propose a novel alternating minimization based method which iteratively refines estimates of the singular vectors. We show that under certain standard assumptions, our method can recover a three-mode $n\times n\times n$ dimensional rank-$r$ tensor exactly from $O(n^{3/2} r^5 \log^4 n)$ randomly sampled entries. In the process of proving this result, we solve two challenging sub-problems for tensors with missing data. First, in the process of analyzing the initialization step, we prove a generalization of a celebrated result by Szemer\'edie et al. on the spectrum of random graphs. Next, we prove global convergence of alternating minimization with a good initialization. Simulations suggest that the dependence of the sample size on dimensionality $n$ is indeed tight.

data quality, machine learning, tensor, (17 more...)

arXiv.org Machine Learning

1406.2784

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Quality (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

Learning Latent Variable Gaussian Graphical Models

Meng, Zhaoshi, Eriksson, Brian, Hero, Alfred O. III

arXiv.org Machine LearningJun-10-2014

Gaussian graphical models (GGM) have been widely used in many high-dimensional applications ranging from biological and financial data to recommender systems. Sparsity in GGM plays a central role both statistically and computationally. Unfortunately, real-world data often does not fit well to sparse graphical models. In this paper, we focus on a family of latent variable Gaussian graphical models (LVGGM), where the model is conditionally sparse given latent variables, but marginally non-sparse. In LVGGM, the inverse covariance matrix has a low-rank plus sparse structure, and can be learned in a regularized maximum likelihood framework. We derive novel parameter estimation error bounds for LVGGM under mild conditions in the high-dimensional setting. These results complement the existing theory on the structural learning, and open up new possibilities of using LVGGM for statistical inference.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1406.2721

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.50)

Industry: Banking & Finance (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Equivalence of Learning Algorithms

Audiffren, Julien, Kadri, Hachem

arXiv.org Machine LearningJun-10-2014

The purpose of this paper is to introduce a concept of equivalence between machine learning algorithms. We define two notions of algorithmic equivalence, namely, weak and strong equivalence. These notions are of paramount importance for identifying when learning prop erties from one learning algorithm can be transferred to another. Using regularized kernel machines as a case study, we illustrate the importance of the introduced equivalence concept by analyzing the relation between kernel ridge regression (KRR) and m-power regularized least squares regression (M-RLSR) algorithms.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1406.2622

Country: Europe > France (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)

Add feedback

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

Dauphin, Yann, Pascanu, Razvan, Gulcehre, Caglar, Cho, Kyunghyun, Ganguli, Surya, Bengio, Yoshua

arXiv.org Machine LearningJun-10-2014

A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum. Here we argue, based on results from statistical physics, random matrix theory, neural network theory, and empirical evidence, that a deeper and more profound difficulty originates from the proliferation of saddle points, not local minima, especially in high dimensional problems of practical interest. Such saddle points are surrounded by high error plateaus that can dramatically slow down learning, and give the illusory impression of the existence of a local minimum. Motivated by these arguments, we propose a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods. We apply this algorithm to deep or recurrent neural network training, and provide numerical evidence for its superior optimization performance. This work extends the results of Pascanu et al. (2014).

artificial intelligence, machine learning, saddle point, (17 more...)

arXiv.org Machine Learning

1406.2572

Country: North America > Canada > Quebec (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Feature Selection For High-Dimensional Clustering

Wasserman, Larry, Azizyan, Martin, Singh, Aarti

arXiv.org Machine LearningJun-9-2014

There are many methods for feature selection in high-dimensional classification and regression. These methods require assumptions such as sparsity and incoherence. Some methods (Fan and Lv 2008) also assume that relevant variables are detectable through marginal correlations. Given these assumptions, one can prove guarantees for the performance of the method. A similar theory for feature selection in clustering is lacking. There exist a number of methods but they do not come with precise assumptions and guarantees. In this paper we propose a method involving two steps: 1. A screening step to eliminate uninformative features.

artificial intelligence, assumption, machine learning, (13 more...)

arXiv.org Machine Learning

1406.224

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures

Azizyan, Martin, Singh, Aarti, Wasserman, Larry

arXiv.org Machine LearningJun-9-2014

We consider the problem of clustering data points in high dimensions, i.e. when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for learning parameters of a Gaussian mixture model and sparse linear discriminant analysis (LDA). In addition to cluster assignments, the method returns an estimate of the set of features relevant for clustering. Our results indicate that the sample complexity of clustering depends on the sparsity of the relevant feature set, while only scaling logarithmically with the ambient dimension. Additionally, we require much milder assumptions than existing work on clustering in high dimensions. In particular, we do not require spherical clusters nor necessitate mean separation along relevant dimensions.

artificial intelligence, machine learning, relevant feature, (16 more...)

arXiv.org Machine Learning

1406.2206

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Learning directed acyclic graphs via bootstrap aggregating

Wang, Ru, Peng, Jie

arXiv.org Machine LearningJun-9-2014

Probabilistic graphical models are graphical representations of probability distributions. Graphical models have applications in many fields including biology, social sciences, linguistic, neuroscience. In this paper, we propose directed acyclic graphs (DAGs) learning via bootstrap aggregating. The proposed procedure is named as DAGBag. Specifically, an ensemble of DAGs is first learned based on bootstrap resamples of the data and then an aggregated DAG is derived by minimizing the overall distance to the entire ensemble. A family of metrics based on the structural hamming distance is defined for the space of DAGs (of a given node set) and is used for aggregation. Under the high-dimensional-low-sample size setting, the graph learned on one data set often has excessive number of false positive edges due to over-fitting of the noise. Aggregation overcomes over-fitting through variance reduction and thus greatly reduces false positives. We also develop an efficient implementation of the hill climbing search algorithm of DAG learning which makes the proposed method computationally competitive for the high-dimensional regime. The DAGBag procedure is implemented in the R package dagbag.

artificial intelligence, gshd, machine learning, (17 more...)

arXiv.org Machine Learning

1406.2098

Country: North America > United States > California (0.46)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.89)
(2 more...)

Add feedback

Box Drawings for Learning with Imbalanced Data

Goh, Siong Thye, Rudin, Cynthia

arXiv.org Machine LearningJun-7-2014

The vast majority of real world classification problems are imbalanced, meaning there are far fewer data from the class of interest (the positive class) than from other classes. We propose two machine learning algorithms to handle highly imbalanced classification problems. The classifiers are disjunctions of conjunctions, and are created as unions of parallel axis rectangles around the positive examples, and thus have the benefit of being interpretable. The first algorithm uses mixed integer programming to optimize a weighted balance between positive and negative class accuracies. Regularization is introduced to improve generalization performance. The second method uses an approximation in order to assist with scalability. Specifically, it follows a characterize then discriminate approach, where the positive class is characterized first by boxes, and then each box boundary becomes a separate discriminative classifier. This method has the computational advantages that it can be easily parallelized, and considers only the relevant regions of feature space.

algorithm, boundary, classifier, (16 more...)

arXiv.org Machine Learning

1403.3378

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Wisconsin (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)

Add feedback

Compressed Gaussian Process

Guhaniyogi, Rajarshi, Dunson, David B.

arXiv.org Machine LearningJun-7-2014

Nonparametric regression for massive numbers of samples (n) and features (p) is an increasingly important problem. In big n settings, a common strategy is to partition the feature space, and then separately apply simple models to each partition set. We propose an alternative approach, which avoids such partitioning and the associated sensitivity to neighborhood choice and distance metrics, by using random compression combined with Gaussian process regression. The proposed approach is particularly motivated by the setting in which the response is conditionally independent of the features given the projection to a low dimensional manifold. Conditionally on the random compression matrix and a smoothness parameter, the posterior distribution for the regression surface and posterior predictive distributions are available analytically. Running the analysis in parallel for many random compression matrices and smoothness parameters, model averaging is used to combine the results. The algorithm can be implemented rapidly even in very big n and p problems, has strong theoretical justification, and is found to yield state of the art predictive performance.

data mining, machine learning, predictive interval, (22 more...)

arXiv.org Machine Learning

1406.1916

Genre: Research Report (0.82)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
(3 more...)

Add feedback

Online Clustering of Bandits

Gentile, Claudio, Li, Shuai, Zappella, Giovanni

arXiv.org Machine LearningJun-6-2014

We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation ("bandit") strategies. We provide a sharp regret analysis of this algorithm in a standard stochastic noise setting, demonstrate its scalability properties, and prove its effectiveness on a number of artificial and real-world datasets. Our experiments show a significant increase in prediction performance over state-of-the-art methods for bandit problems.

algorithm, big data, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

1401.8257

Country:

Europe (0.14)
Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Communications (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback