AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Embarrassingly Parallel Variational Inference in Nonconjugate Models

Neiswanger, Willie, Wang, Chong, Xing, Eric

arXiv.org Machine LearningOct-14-2015

We develop a parallel variational inference (VI) procedure for use in data-distributed settings, where each machine only has access to a subset of data and runs VI independently, without communicating with other machines. This type of "embarrassingly parallel" procedure has recently been developed for MCMC inference algorithms; however, in many cases it is not possible to directly extend this procedure to VI methods without requiring certain restrictive exponential family conditions on the form of the model. Furthermore, most existing (nonparallel) VI methods are restricted to use on conditionally conjugate models, which limits their applicability. To combat these issues, we make use of the recently proposed nonparametric VI to facilitate an embarrassingly parallel VI procedure that can be applied to a wider scope of models, including to nonconjugate models. We derive our embarrassingly parallel VI algorithm, analyze our method theoretically, and demonstrate our method empirically on a few nonconjugate models.

approximation, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1510.04163

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)

Add feedback

VB calibration to improve the interface between phone recognizer and i-vector extractor

Brümmer, Niko

arXiv.org Machine LearningOct-14-2015

The EM training algorithm of the classical i-vector extractor is often incorrectly described as a maximum-likelihood method. The i-vector model is however intractable: the likelihood itself and the hidden-variable posteriors needed for the EM algorithm cannot be computed in closed form. We show here that the classical i-vector extractor recipe is actually a mean-field variational Bayes (VB) recipe. This theoretical VB interpretation turns out to be of further use, because it also offers an interpretation of the newer phonetic i-vector extractor recipe, thereby unifying the two flavours of extractor. More importantly, the VB interpretation is also practically useful: it suggests ways of modifying existing i-vector extractors to make them more accurate. In particular, in existing methods, the approximate VB posterior for the GMM states is fixed, while only the parameters of the generative model are adapted. Here we explore the possibility of also mildly adjusting (calibrating) those posteriors, so that they better fit the generative model.

artificial intelligence, machine learning, posterior, (17 more...)

arXiv.org Machine Learning

1510.03203

Country: Africa (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback

Backhaul-Constrained Multi-Cell Cooperation Leveraging Sparsity and Spectral Clustering

Jain, Swayambhoo, Kim, Seung-Jun, Giannakis, Georgios B.

arXiv.org Machine LearningOct-14-2015

Multi-cell cooperative processing with limited backhaul traffic is studied for cellular uplinks. Aiming at reduced backhaul overhead, a sparsity-regularized multi-cell receive-filter design problem is formulated. Both unstructured distributed cooperation as well as clustered cooperation, in which base station groups are formed for tight cooperation, are considered. Dynamic clustered cooperation, where the sparse equalizer and the cooperation clusters are jointly determined, is solved via alternating minimization based on spectral clustering and group-sparse regression. Furthermore, decentralized implementations of both unstructured and clustered cooperation schemes are developed for scalability, robustness and computational efficiency. Extensive numerical tests verify the efficacy of the proposed methods.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1409.8359

Country:

Europe (1.00)
North America > United States > Maryland (0.46)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (0.50)

Industry: Telecommunications (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Language Models for Image Captioning: The Quirks and What Works

Devlin, Jacob, Cheng, Hao, Fang, Hao, Gupta, Saurabh, Deng, Li, He, Xiaodong, Zweig, Geoffrey, Mitchell, Margaret

arXiv.org Artificial IntelligenceOct-14-2015

Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this paper, we compare the merits of these different language modeling approaches for the first time by using the same state-of-the-art CNN as input. We examine issues in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.

caption, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

1505.01809

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Lifted Relational Neural Networks

Sourek, Gustav, Aschenbrenner, Vojtech, Zelezny, Filip, Kuzelka, Ondrej

arXiv.org Artificial IntelligenceOct-13-2015

We propose a method combining relational-logic representations with neural network learning. A general lifted architecture, possibly reflecting some background domain knowledge, is described through relational rules which may be handcrafted or learned. The relational rule-set serves as a template for unfolding possibly deep neural networks whose structures also reflect the structures of given training or testing relational examples. Different networks corresponding to different examples share their weights, which co-evolve during training by stochastic gradient descent algorithm. The framework allows for hierarchical relational modeling constructs and learning of latent relational concepts through shared hidden layers weights corresponding to the rules. Discovery of notable relational concepts and experiments on 78 relational learning benchmarks demonstrate favorable performance of the method.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

1508.05128

Country: Europe > Czechia (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

On the Robustness of Regularized Pairwise Learning Methods Based on Kernels

Christmann, Andreas, Zhou, Ding-Xuan

arXiv.org Machine LearningOct-12-2015

Regularized empirical risk minimization including support vector machines plays an important role in machine learning theory. In this paper regularized pairwise learning (RPL) methods based on kernels will be investigated. One example is regularized minimization of the error entropy loss which has recently attracted quite some interest from the viewpoint of consistency and learning rates. This paper shows that such RPL methods have additionally good statistical robustness properties, if the loss function and the kernel are chosen appropriately. We treat two cases of particular interest: (i) a bounded and non-convex loss function and (ii) an unbounded convex loss function satisfying a certain Lipschitz type condition.

artificial intelligence, loss function, machine learning, (17 more...)

arXiv.org Machine Learning

1510.03267

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.54)

Add feedback

The intrinsic value of HFO features as a biomarker of epileptic activity

Gliske, Stephen V., Moon, Kevin R., Stacey, William C., Hero, Alfred O. III

arXiv.org Machine LearningOct-12-2015

High frequency oscillations (HFOs) are a promising biomarker of epileptic brain tissue and activity. HFOs additionally serve as a prototypical example of challenges in the analysis of discrete events in high-temporal resolution, intracranial EEG data. Two primary challenges are 1) dimensionality reduction, and 2) assessing feasibility of classification. Dimensionality reduction assumes that the data lie on a manifold with dimension less than that of the feature space. However, previous HFO analyses have assumed a linear manifold, global across time, space (i.e. recording electrode/channel), and individual patients. Instead, we assess both a) whether linear methods are appropriate and b) the consistency of the manifold across time, space, and patients. We also estimate bounds on the Bayes classification error to quantify the distinction between two classes of HFOs (those occurring during seizures and those occurring due to other processes). This analysis provides the foundation for future clinical use of HFO features and buides the analysis for other discrete events, such as individual action potentials or multi-unit activity.

artificial intelligence, intrinsic dimension, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/ICASSP.2016.7472887

1510.03507

Country: North America > United States > Michigan (0.15)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

A Markov Jump Process for More Efficient Hamiltonian Monte Carlo

Berger, Andrew B., Mudigonda, Mayur, DeWeese, Michael R., Sohl-Dickstein, Jascha

arXiv.org Machine LearningOct-11-2015

In most sampling algorithms, including Hamiltonian Monte Carlo, transition rates between states correspond to the probability of making a transition in a single time step, and are constrained to be less than or equal to 1. We derive a Hamiltonian Monte Carlo algorithm using a continuous time Markov jump process, and are thus able to escape this constraint. Transition rates in a Markov jump process need only be non-negative. We demonstrate that the new algorithm leads to improved mixing for several example problems, both by evaluating the spectral gap of the Markov operator, and by computing autocorrelation as a function of compute time. We release the algorithm as an open source Python package.

artificial intelligence, machine learning, monte carlo, (17 more...)

arXiv.org Machine Learning

1509.03808

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Rivalry of Two Families of Algorithms for Memory-Restricted Streaming PCA

Li, Chun-Liang, Lin, Hsuan-Tien, Lu, Chi-Jen

arXiv.org Machine LearningOct-11-2015

We study the problem of recovering the subspace spanned by the first $k$ principal components of $d$-dimensional data under the streaming setting, with a memory bound of $O(kd)$. Two families of algorithms are known for this problem. The first family is based on the framework of stochastic gradient descent. Nevertheless, the convergence rate of the family can be seriously affected by the learning rate of the descent steps and deserves more serious study. The second family is based on the power method over blocks of data, but setting the block size for its existing algorithms is not an easy task. In this paper, we analyze the convergence rate of a representative algorithm with decayed learning rate (Oja and Karhunen, 1985) in the first family for the general $k>1$ case. Moreover, we propose a novel algorithm for the second family that sets the block sizes automatically and dynamically with faster convergence rate. We then conduct empirical studies that fairly compare the two families on real-world data. The studies reveal the advantages and disadvantages of these two families.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1506.0149

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

p-Markov Gaussian Processes for Scalable and Expressive Online Bayesian Nonparametric Time Series Forecasting

Samo, Yves-Laurent Kom, Roberts, Stephen J.

arXiv.org Machine LearningOct-9-2015

In this paper we introduce a novel online time series forecasting model we refer to as the pM-GP filter. We show that our model is equivalent to Gaussian process regression, with the advantage that both online forecasting and online learning of the hyper-parameters have a constant (rather than cubic) time complexity and a constant (rather than squared) memory requirement in the number of observations, without resorting to approximations. Moreover, the proposed model is expressive in that the family of covariance functions of the implied latent process, namely the spectral Matern kernels, have recently been proven to be capable of approximating arbitrarily well any translation-invariant covariance function. The benefit of our approach compared to competing models is demonstrated using experiments on several real-life datasets.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

1510.0283

Genre: Research Report (0.50)

Industry: Education (0.49)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.70)

Add feedback