AITopics

1402.4293

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.29)
North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Klami, Arto, Bouchard, Guillaume, Tripathi, Abhishek

Group-sparse Embeddings in Collective Matrix Factorization

arXiv.org Machine LearningFeb-18-2014

CMF is a technique for simultaneously learning low-rank representations based on a collection of matrices with shared entities. A typical example is the joint modeling of user-item, item-property, and user-feature matrices in a recommender system. The key idea in CMF is that the embeddings are shared across the matrices, which enables transferring information between them. The existing solutions, however, break down when the individual matrices have low-rank structure not shared with others. In this work we present a novel CMF solution that allows each of the matrices to have a separate low-rank structure that is independent of the other matrices, as well as structures that are shared only by a subset of them. We compare MAP and variational Bayesian solutions based on alternating optimization algorithms and show that the model automatically infers the nature of each factor using group-wise sparsity. Our approach supports in a principled way continuous, binary and count observations and is efficient for sparse matrices involving missing data. We illustrate the solution on a number of examples, focusing in particular on an interesting use-case of augmented multi-view learning.

artificial intelligence, bayesian inference, machine learning, (18 more...)

1312.5921

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Jung, Sungkyu, Qiao, Xingye

A Statistical Approach to Set Classification by Feature Selection with Applications to Classification of Histopathology Images

arXiv.org Machine LearningFeb-18-2014

Set classification problems arise when classification tasks are based on sets of observations as opposed to individual observations. In set classification, a classification rule is trained with $N$ sets of observations, where each set is labeled with class information, and the prediction of a class label is performed also with a set of observations. Data sets for set classification appear, for example, in diagnostics of disease based on multiple cell nucleus images from a single tissue. Relevant statistical models for set classification are introduced, which motivate a set classification framework based on context-free feature extraction. By understanding a set of observations as an empirical distribution, we employ a data-driven method to choose those features which contain information on location and major variation. In particular, the method of principal component analysis is used to extract the features of major variation. Multidimensional scaling is used to represent features as vector-valued points on which conventional classifiers can be applied. The proposed set classification approaches achieve better classification results than competing methods in a number of simulated data examples. The benefits of our method are demonstrated in an analysis of histopathology images of cell nuclei related to liver cancer.

artificial intelligence, classification, machine learning, (17 more...)

doi: 10.1111/biom.12164

1402.4539

Country:

North America > United States > Pennsylvania (0.28)
North America > United States > California (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures

Anderson, Joseph, Belkin, Mikhail, Goyal, Navin, Rademacher, Luis, Voss, James

In this paper we show that very large mixtures of Gaussians are efficiently learnable in high dimension. More precisely, we prove that a mixture with known identical covariance matrices whose number of components is a polynomial of any fixed degree in the dimension n is polynomially learnable as long as a certain non-degeneracy condition on the means is satisfied. It turns out that this condition is generic in the sense of smoothed complexity, as soon as the dimensionality of the space is high enough. Moreover, we prove that no such condition can possibly exist in low dimension and the problem of learning the parameters is generically hard. In contrast, much of the existing work on Gaussian Mixtures relies on low-dimensional projections and thus hits an artificial barrier. Our main result on mixture recovery relies on a new "Poissonization"-based technique, which transforms a mixture of Gaussians to a linear map of a product distribution. The problem of learning this map can be efficiently solved using some recent results on tensor decompositions and Independent Component Analysis (ICA), thus giving an algorithm for recovering the mixture. In addition, we combine our low-dimensional hardness results for Gaussian mixtures with Poissonization to show how to embed difficult instances of low-dimensional Gaussian mixtures into the ICA setting, thus establishing exponential information-theoretic lower bounds for underdetermined ICA in low dimension. To the best of our knowledge, this is the first such result in the literature. In addition to contributing to the problem of Gaussian mixture learning, we believe that this work is among the first steps toward better understanding the rare phenomenon of the "blessing of dimensionality" in the computational aspects of statistical inference.

gaussian, ica model, underdeterminedica, (16 more...)

1311.2891

Country:

North America > United States > Ohio (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.54)

Dimensionality reduction with subgaussian matrices: a unified theory

Dirksen, Sjoerd

We present a theory for Euclidean dimensionality reduction with subgaussian matrices which unifies several restricted isometry property and Johnson-Lindenstrauss type results obtained earlier for specific data sets. In particular, we recover and, in several cases, improve results for sets of sparse and structured sparse vectors, low-rank matrices and tensors, and smooth manifolds. In addition, we establish a new Johnson-Lindenstrauss embedding for data sets taking the form of an infinite union of subspaces of a Hilbert space.

artificial intelligence, data mining, machine learning, (17 more...)

1402.3973

Country: North America > United States (0.93)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.62)

Continuous Learning: Engineering Super Features With Feature Algebras

Tetelman, Michael

In this paper we consider a problem of searching a space of predictive models for a given training data set. We propose an iterative procedure for deriving a sequence of improving models and a corresponding sequence of sets of non-linear features on the original input space. After a finite number of iterations N, the non-linear features become 2^N -degree polynomials on the original space. We show that in a limit of an infinite number of iterations derived non-linear features must form an associative algebra: a product of two features is equal to a linear combination of features from the same feature space for any given input point. Because each iteration consists of solving a series of convex problems that contain all previous solutions, the likelihood of the models in the sequence is increasing with each iteration while the dimension of the model parameter space is set to a limited controlled value.

artificial intelligence, iteration, machine learning, (17 more...)

1312.5398

Country: North America > United States (0.14)

Genre: Research Report (0.65)

Industry:

Education > Educational Setting > Continuing Education (0.41)
Education > Curriculum > Subject-Specific Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)

Aravkin, Aleksandr Y., Choromanska, Anna, Jebara, Tony, Kanevsky, Dimitri

Semistochastic Quadratic Bound Methods

Partition functions arise in a variety of settings, including conditional random fields, logistic regression, and latent gaussian models. In this paper, we consider semistochastic quadratic bound (SQB) methods for maximum likelihood estimation based on partition function optimization. Batch methods based on the quadratic bound were recently proposed for this class of problems, and performed favorably in comparison to state-of-the-art techniques. Semistochastic methods fall in between batch algorithms, which use all the data, and stochastic gradient type methods, which use small random selections at each iteration. We build semistochastic quadratic bound-based methods, and prove both global convergence (to a stationary point) under very weak assumptions, and linear convergence rate under stronger assumptions on the objective. To make the proposed methods faster and more stable, we consider inexact subproblem minimization and batch-size selection schemes. The efficacy of SQB methods is demonstrated via comparison with several state-of-the-art techniques on commonly used datasets.

artificial intelligence, iteration, machine learning, (14 more...)

1309.1369

Country: Europe (0.28)

Genre:

Research Report > Promising Solution (0.55)
Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.37)

Hurley, Barry, Kotthoff, Lars, Malitsky, Yuri, O'Sullivan, Barry

Proteus: A Hierarchical Portfolio of Solvers and Transformations

arXiv.org Artificial IntelligenceFeb-17-2014

In recent years, portfolio approaches to solving SAT problems and CSPs have become increasingly common. There are also a number of different encodings for representing CSPs as SAT instances. In this paper, we leverage advances in both SAT and CSP solving to present a novel hierarchical portfolio-based approach to CSP solving, which we call Proteus, that does not rely purely on CSP solvers. Instead, it may decide that it is best to encode a CSP problem instance into SAT, selecting an appropriate encoding and a corresponding SAT solver. Our experimental evaluation used an instance of Proteus that involved four CSP solvers, three SAT encodings, and six SAT solvers, evaluated on the most challenging problem instances from the CSP solver competitions, involving global and intensional constraints. We show that significant performance improvements can be achieved by Proteus obtained by exploiting alternative view-points and solvers for combinatorial problem-solving.

artificial intelligence, machine learning, solver, (17 more...)

arXiv.org Artificial Intelligence

1306.5606

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Cerra, Daniele, Datcu, Mihai, Reinartz, Peter

Authorship Analysis based on Data Compression

arXiv.org Machine LearningFeb-14-2014

This paper proposes to perform authorship analysis using the Fast Compression Distance (FCD), a similarity measure based on compression with dictionaries directly extracted from the written texts. The FCD computes a similarity between two documents through an effective binary search on the intersection set between the two related dictionaries. In the reported experiments the proposed method is applied to documents which are heterogeneous in style, written in five different languages and coming from different historical periods. Results are comparable to the state of the art and outperform traditional compression-based methods.

artificial intelligence, machine learning, natural language, (19 more...)

doi: 10.1016/j.patrec.2014.01.019

1402.3405

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Berglund, Mathias, Raiko, Tapani

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

arXiv.org Machine LearningFeb-14-2014

Contrastive Divergence (CD) and Persistent Contrastive Divergence (PCD) are popular methods for training the weights of Restricted Boltzmann Machines. However, both methods use an approximate method for sampling from the model distribution. As a side effect, these approximations yield significantly different biases and variances for stochastic gradient estimates of individual data points. It is well known that CD yields a biased gradient estimate. In this paper we however show empirically that CD has a lower stochastic gradient estimate variance than exact sampling, while the mean of subsequent PCD estimates has a higher variance than exact sampling. The results give one explanation to the finding that CD can be used with smaller minibatches or higher learning rates than PCD.

artificial intelligence, machine learning, variance, (15 more...)

1312.6002

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.39)