AITopics

1505.02867

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.52)

Bardenet, Rémi, Doucet, Arnaud, Holmes, Chris

On Markov chain Monte Carlo methods for tall data

arXiv.org Machine LearningMay-11-2015

Markov chain Monte Carlo methods are often deemed too computationally intensive to be of any practical use for big data applications, and in particular for inference on datasets containing a large number $n$ of individual data points, also known as tall datasets. In scenarios where data are assumed independent, various approaches to scale up the Metropolis-Hastings algorithm in a Bayesian inference context have been recently proposed in machine learning and computational statistics. These approaches can be grouped into two categories: divide-and-conquer approaches and, subsampling-based algorithms. The aims of this article are as follows. First, we present a comprehensive review of the existing literature, commenting on the underlying assumptions and theoretical guarantees of each method. Second, by leveraging our understanding of these limitations, we propose an original subsampling-based approach which samples from a distribution provably close to the posterior distribution of interest, yet can require less than $O(n)$ data point likelihood evaluations at each iteration for certain statistical models in favourable scenarios. Finally, we have only been able so far to propose subsampling-based methods which display good performance in scenarios where the Bernstein-von Mises approximation of the target posterior distribution is excellent. It remains an open challenge to develop such methods in scenarios where the Bernstein-von Mises approximation is poor.

artificial intelligence, iteration, machine learning, (17 more...)

1505.02827

Country:

Europe > United Kingdom (0.28)
North America > United States (0.28)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Cai, Diana, Ackerman, Nathanael, Freer, Cameron

An iterative step-function estimator for graphons

arXiv.org Machine LearningMay-11-2015

Exchangeable graphs arise via a sampling procedure from measurable functions known as graphons. A natural estimation problem is how well we can recover a graphon given a single graph sampled from it. One general framework for estimating a graphon uses step-functions obtained by partitioning the nodes of the graph according to some clustering algorithm. We propose an iterative step-function estimator (ISFE) that, given an initial partition, iteratively clusters nodes based on their edge densities with respect to the previous iteration's partition. We analyze ISFE and demonstrate its performance in comparison with other graphon estimation techniques.

artificial intelligence, graphon, machine learning, (18 more...)

1412.2129

Country: North America > United States > New York (0.28)

Genre: Research Report (0.82)

Industry:

Government > Military (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Le, Can M., Levina, Elizaveta, Vershynin, Roman

Optimization via Low-rank Approximation for Community Detection in Networks

Community detection is one of the fundamental problems of network analysis, for which a number of methods have been proposed. Most model-based or criteria-based methods have to solve an optimization problem over a discrete set of labels to find communities, which is computationally infeasible. Some fast spectral algorithms have been proposed for specific methods or models, but only on a case-by-case basis. Here we propose a general approach for maximizing a function of a network adjacency matrix over discrete labels by projecting the set of labels onto a subspace approximating the leading eigenvectors of the expected adjacency matrix. This projection onto a low-dimensional space makes the feasible set of labels much smaller and the optimization problem much easier. We prove a general result about this method and show how to apply it to several previously proposed community detection criteria, establishing its consistency for label estimation in each case and demonstrating the fundamental connection between spectral properties of the network and various model-based approaches to community detection. Simulations and applications to real-world data are included to demonstrate our method performs well for multiple problems over a wide range of parameters.

artificial intelligence, data mining, machine learning, (19 more...)

1406.0067

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Oceania > New Zealand (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Zhao, Qibin, Zhang, Liqing, Cichocki, Andrzej

Bayesian Sparse Tucker Models for Dimension Reduction and Tensor Completion

Tucker decomposition is the cornerstone of modern machine learning on tensorial data analysis, which have attracted considerable attention for multiway feature extraction, compressive sensing, and tensor completion. The most challenging problem is related to determination of model complexity (i.e., multilinear rank), especially when noise and missing data are present. In addition, existing methods cannot take into account uncertainty information of latent factors, resulting in low generalization performance. To address these issues, we present a class of probabilistic generative Tucker models for tensor decomposition and completion with structural sparsity over multilinear latent space. To exploit structural sparse modeling, we introduce two group sparsity inducing priors by hierarchial representation of Laplace and Student-t distributions, which facilitates fully posterior inference. For model learning, we derived variational Bayesian inferences over all model (hyper)parameters, and developed efficient and scalable algorithms based on multilinear operations. Our methods can automatically adapt model complexity and infer an optimal multilinear rank by the principle of maximum lower bound of model evidence. Experimental results and comparisons on synthetic, chemometrics and neuroimaging data demonstrate remarkable performance of our models for recovering ground-truth of multilinear rank and missing entries.

completion, data mining, machine learning, (18 more...)

1505.02343

Country: Asia > China (0.28)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (0.68)
Health & Medicine > Diagnostic Medicine (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Dai, Zhenwen, Hensman, James, Lawrence, Neil

Spike and Slab Gaussian Process Latent Variable Models

The Gaussian process latent variable model (GP-LVM) is a popular approach to non-linear probabilistic dimensionality reduction. One design choice for the model is the number of latent variables. We present a spike and slab prior for the GP-LVM and propose an efficient variational inference procedure that gives a lower bound of the log marginal likelihood. The new model provides a more principled approach for selecting latent dimensions than the standard way of thresholding the length-scale parameters. The effectiveness of our approach is demonstrated through experiments on real and simulated data. Further, we extend multi-view Gaussian processes that rely on sharing latent dimensions (known as manifold relevance determination) with spike and slab priors. This allows a more principled approach for selecting a subset of the latent space for each view of data. The extended model outperforms the previous state-of-the-art when applied to a cross-modal multimedia retrieval task.

artificial intelligence, latent dimension, machine learning, (13 more...)

1505.02434

Country:

North America > United States (0.68)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)

Contrastive Pessimistic Likelihood Estimation for Semi-Supervised Classification

Loog, Marco

Improvement guarantees for semi-supervised classifiers can currently only be given under restrictive conditions on the data. We propose a general way to perform semi-supervised parameter estimation for likelihood-based classifiers for which, on the full training set, the estimates are never worse than the supervised solution in terms of the log-likelihood. We argue, moreover, that we may expect these solutions to really improve upon the supervised classifier in particular cases. In a worked-out example for LDA, we take it one step further and essentially prove that its semi-supervised version is strictly better than its supervised counterpart. The two new concepts that form the core of our estimation principle are contrast and pessimism. The former refers to the fact that our objective function takes the supervised estimates into account, enabling the semi-supervised solution to explicitly control the potential improvements over this estimate. The latter refers to the fact that our estimates are conservative and therefore resilient to whatever form the true labeling of the unlabeled data takes on. Experiments demonstrate the improvements in terms of both the log-likelihood and the classification error rate on independent test sets.

artificial intelligence, classifier, machine learning, (17 more...)

1503.00269

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Pilanci, Mert, Wainwright, Martin J.

Newton Sketch: A Linear-time Optimization Algorithm with Linear-Quadratic Convergence

arXiv.org Machine LearningMay-9-2015

We propose a randomized second-order method for optimization known as the Newton Sketch: it is based on performing an approximate Newton step using a randomly projected or sub-sampled Hessian. For self-concordant functions, we prove that the algorithm has super-linear convergence with exponentially high probability, with convergence and complexity guarantees that are independent of condition numbers and related problem-dependent quantities. Given a suitable initialization, similar guarantees also hold for strongly convex and smooth objectives without self-concordance. When implemented using randomized projections based on a sub-sampled Hadamard basis, the algorithm typically has substantially lower complexity than Newton's method. We also describe extensions of our methods to programs involving convex constraints that are equipped with self-concordant barriers. We discuss and illustrate applications to linear programs, quadratic programs with convex constraints, logistic regression and other generalized linear models, as well as semidefinite programs.

artificial intelligence, machine learning, sketch, (18 more...)

1505.0225

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Jiang, Hansi, Meyer, Carl

Relations Between Adjacency and Modularity Graph Partitioning

arXiv.org Machine LearningMay-9-2015

In this paper the exact linear relation between the leading eigenvector of the unnormalized modularity matrix and the eigenvectors of the adjacency matrix is developed. Based on this analysis a method to approximate the leading eigenvector of the modularity matrix is given, and the relative error of the approximation is derived. A complete proof of the equivalence between normalized modularity clustering and normalized adjacency clustering is also given. A new metric is defined to describe the agreement of two clustering methods, and some applications and experiments are given to illustrate and corroborate the points that are made in the theoretical development.

artificial intelligence, machine learning, matrix, (18 more...)

1505.03481

Country: North America > United States > North Carolina (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Samo, Yves-Laurent Kom, Roberts, Stephen

Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes

arXiv.org Machine LearningMay-9-2015

In this paper we propose the first non-parametric Bayesian model using Gaussian Processes to make inference on Poisson Point Processes without resorting to gridding the domain or to introducing latent thinning points. Unlike competing models that scale cubically and have a squared memory requirement in the number of data points, our model has a linear complexity and memory requirement. We propose an MCMC sampler and show that our model is faster, more accurate and generates less correlated samples than competing models on both synthetic and real-life data. Finally, we show that our model easily handles data sizes not considered thus far by alternate approaches.

artificial intelligence, intensity function, machine learning, (12 more...)

1410.6834

Country: Europe (0.67)

Genre: Research Report (0.40)

Industry: Materials > Metals & Mining (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)