AITopics | Bayesian Learning

Collaborating Authors

Bayesian Learning

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

An Exploration of Softmax Alternatives Belonging to the Spherical Loss Family

de Brébisson, Alexandre, Vincent, Pascal

arXiv.org Machine LearningFeb-28-2016

In a multi-class classification problem, it is standard to model the output of a neural network as a categorical distribution conditioned on the inputs. The output must therefore be positive and sum to one, which is traditionally enforced by a softmax. This probabilistic mapping allows to use the maximum likelihood principle, which leads to the well-known log-softmax loss. However the choice of the softmax function seems somehow arbitrary as there are many other possible normalizing functions. It is thus unclear why the log-softmax loss would perform better than other loss alternatives. In particular Vincent et al. (2015) recently introduced a class of loss functions, called the spherical family, for which there exists an efficient algorithm to compute the updates of the output weights irrespective of the output size. In this paper, we explore several loss functions from this family as possible alternatives to the traditional log-softmax. In particular, we focus our investigation on spherical bounds of the log-softmax loss and on two spherical log-likelihood losses, namely the log-Spherical Softmax suggested by Vincent et al. (2015) and the log-Taylor Softmax that we introduce. Although these alternatives do not yield as good results as the log-softmax loss on two language modeling tasks, they surprisingly outperform it in our experiments on MNIST and CIFAR-10, suggesting that they might be relevant in a broad range of applications.

artificial intelligence, machine learning, softmax, (19 more...)

arXiv.org Machine Learning

1511.05042

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

A Bayesian baseline for belief in uncommon events

Palonen, V.

arXiv.org Machine LearningFeb-27-2016

The plausibility of uncommon events and miracles based on testimony of such an event has been much discussed. When analyzing the probabilities involved, it has mostly been assumed that the common events can be taken as data in the calculations. However, we usually have only testimonies for the common events. While this difference does not have a significant effect on the inductive part of the inference, it has a large influence on how one should view the reliability of testimonies. In this work, a full Bayesian solution is given for the more realistic case, where one has a large number of testimonies for a common event and one testimony for an uncommon event. It is seen that, in order for there to be a large amount of testimonies for a common event, the testimonies will probably be quite reliable. For this reason, because the testimonies are quite reliable based on the testimonies for the common events, the probability for the uncommon event, given a testimony for it, is also higher. Hence, one should be more open-minded when considering the plausibility of uncommon events.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1602.07836

Country: Europe > Finland (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)

Add feedback

A 'Gibbs-Newton' Technique for Enhanced Inference of Multivariate Polya Parameters and Topic Models

Khalifa, Osama, Corne, David Wolfe, Chantler, Mike

arXiv.org Machine LearningFeb-27-2016

Hyper-parameters play a major role in the learning and inference process of latent Dirichlet allocation (LDA). In order to begin the LDA latent variables learning process, these hyper-parameters values need to be pre-determined. We propose an extension for LDA that we call 'Latent Dirichlet allocation Gibbs Newton' (LDA-GN), which places non-informative priors over these hyper-parameters and uses Gibbs sampling to learn appropriate values for them. At the heart of LDA-GN is our proposed 'Gibbs-Newton' algorithm, which is a new technique for learning the parameters of multivariate Polya distributions. We report Gibbs-Newton performance results compared with two prominent existing approaches to the latter task: Minka's fixed-point iteration method and the Moments method. We then evaluate LDA-GN in two ways: (i) by comparing it with standard LDA in terms of the ability of the resulting topic models to generalize to unseen documents; (ii) by comparing it with standard LDA in its performance on a binary classification task.

lda-gn, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1510.06646

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.86)

Add feedback

Multivariate response and parsimony for Gaussian cluster-weighted models

Dang, Utkarsh J., Punzo, Antonio, McNicholas, Paul D., Ingrassia, Salvatore, Browne, Ryan P.

arXiv.org Machine LearningFeb-26-2016

A family of parsimonious Gaussian cluster-weighted models is presented. This family concerns a multivariate extension to cluster-weighted modelling that can account for correlations between multivariate responses. Parsimony is attained by constraining parts of an eigen-decomposition imposed on the component covariance matrices. A sufficient condition for identifiability is provided and an expectation-maximization algorithm is presented for parameter estimation. Model performance is investigated on both synthetic and classical real data sets and compared with some popular approaches. Finally, accounting for linear dependencies in the presence of a linear regression structure is shown to offer better performance, vis-\`{a}-vis clustering, over existing methodologies.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1411.056

Country:

North America > United States (1.00)
Europe (0.67)
North America > Canada > Ontario (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Learning Gaussian Graphical Models With Fractional Marginal Pseudo-likelihood

Leppä-aho, Janne, Pensar, Johan, Roos, Teemu, Corander, Jukka

arXiv.org Machine LearningFeb-25-2016

We propose a Bayesian approximate inference method for learning the dependence structure of a Gaussian graphical model. Using pseudo-likelihood, we derive an analytical expression to approximate the marginal likelihood for an arbitrary graph structure without invoking any assumptions about decomposability. The majority of the existing methods for learning Gaussian graphical models are either restricted to decomposable graphs or require specification of a tuning parameter that may have a substantial impact on learned structures. By combining a simple sparsity inducing prior for the graph structures with a default reference prior for the model parameters, we obtain a fast and easily applicable scoring function that works well for even high-dimensional data. We demonstrate the favourable performance of our approach by large-scale comparisons against the leading methods for learning non-decomposable Gaussian graphical models. A theoretical justification for our method is provided by showing that it yields a consistent estimator of the graph structure.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1016/j.ijar.2017.01.001

1602.07863

Country: Europe > Finland (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Asymptotic consistency and order specification for logistic classifier chains in multi-label learning

Teisseyre, Paweł

arXiv.org Machine LearningFeb-24-2016

Machine Learning manuscript No. (will be inserted by the editor)Asymptotic consistency and order specification for logistic classifier chains in multi-label learning Paweł T eisseyre Received: date / Accepted: date Abstract Classifier chains are popular and effective method to tackle a multi-label classification problem. The aim of this paper is to study the asymptotic properties of the chain model in which the conditional probabilities are of the logistic form. In particular we find conditions on the number of labels and the distribution of feature vector under which the estimated mode of the joint distribution of labels converges to the true mode. Best of our knowledge, this important issue has not yet been studied in the context of multi-label learning. We also investigate how the order of model building in a chain influences the estimation of the joint distribution of labels. We establish the link between the problem of incorrect ordering in the chain and incorrect model specification. We propose a procedure of determining the optimal ordering of labels in the chain, which is based on using measures of correct specification and allows to find the ordering such that the consecutive logistic models are best possibly specified. The other important question raised in this paper is how accurately can we estimate the joint posterior probability when the ordering of labels is wrong or the logistic models in the chain are incorrectly specified. The numerical experiments illustrate the theoretical results. Keywords classifier chains· logistic regression· joint mode estimation· label ordering· asymptotic consistency 1 Introduction In multi-label classification the task is to automatically assign an object to multiple categories based on its characteristics. Each object of our interest is described by a feature vector x belonging to p-dimensional space and vector of K labels y ( y 1,..., y K)′ . In this paper we consider binary labels such thaty k 1 indicates that the considered object belongs to k-th category or has the k-th property. The issue has recently attracted significant attention, motivated by an increasing number of applications such as image and video annotationPaweł Teisseyre Institute of Computer Science, Polish Academy of Sciences Jana Kazimierza 5 01-248 Warsaw, Poland Tel.: 48-22-380-05-55 Email: teisseyrep@ipipan.waw.pl

artificial intelligence, logistic model, machine learning, (15 more...)

arXiv.org Machine Learning

1602.07466

Country: Europe > Poland > Masovia Province > Warsaw (0.24)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Max-Margin Nonparametric Latent Feature Models for Link Prediction

Zhu, Jun, Song, Jiaming, Chen, Bei

arXiv.org Machine LearningFeb-24-2016

Link prediction is a fundamental task in statistical network analysis. Recent advances have been made on learning flexible nonparametric Bayesian latent feature models for link prediction. In this paper, we present a max-margin learning method for such nonparametric latent feature relational models. Our approach attempts to unite the ideas of max-margin learning and Bayesian nonparametrics to discover discriminative latent features for link prediction. It inherits the advances of nonparametric Bayesian methods to infer the unknown latent social dimension, while for discriminative link prediction, it adopts the max-margin learning principle by minimizing a hinge-loss using the linear expectation operator, without dealing with a highly nonlinear link likelihood function. For posterior inference, we develop an efficient stochastic variational inference algorithm under a truncated mean-field assumption. Our methods can scale up to large-scale real networks with millions of entities and tens of millions of positive links. We also provide a full Bayesian formulation, which can avoid tuning regularization hyper-parameters. Experimental results on a diverse range of real datasets demonstrate the benefits inherited from max-margin learning and Bayesian nonparametric inference.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1602.07428

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(2 more...)

Add feedback

Recurrent Gaussian Processes

Mattos, César Lincoln C., Dai, Zhenwen, Damianou, Andreas, Forth, Jeremy, Barreto, Guilherme A., Lawrence, Neil D.

arXiv.org Machine LearningFeb-24-2016

We define Recurrent Gaussian Processes (RGP) models, a general family of Bayesian nonparametric models with recurrent GP priors which are able to learn dynamical patterns from sequential data. Similar to Recurrent Neural Networks (RNNs), RGPs can have different formulations for their internal states, distinct inference methods and be extended with deep structures. In such context, we propose a novel deep RGP model whose autoregressive states are latent, thereby performing representation and dynamical learning simultaneously. To fully exploit the Bayesian nature of the RGP model we develop the Recurrent Variational Bayes (REVARB) framework, which enables efficient inference and strong regularization through coherent propagation of uncertainty across the RGP layers and states. We also introduce a RGP extension where variational parameters are greatly reduced by being reparametrized through RNN-based sequential recognition models. We apply our model to the tasks of nonlinear system identification and human motion modeling. The promising obtained results indicate that our RGP model maintains its highly flexibility while being able to avoid overfitting and being applicable even when larger datasets are not available.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1511.06644

Country:

South America > Brazil (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Dynamic Filtering of Time-Varying Sparse Signals via l1 Minimization

Charles, Adam, Balavoine, Aurele, Rozell, Christopher

arXiv.org Machine LearningFeb-22-2016

Despite the importance of sparsity signal models and the increasing prevalence of high-dimensional streaming data, there are relatively few algorithms for dynamic filtering of time-varying sparse signals. Of the existing algorithms, fewer still provide strong performance guarantees. This paper examines two algorithms for dynamic filtering of sparse signals that are based on efficient l1 optimization methods. We first present an analysis for one simple algorithm (BPDN-DF) that works well when the system dynamics are known exactly. We then introduce a novel second algorithm (RWL1-DF) that is more computationally complex than BPDN-DF but performs better in practice, especially in the case where the system dynamics model is inaccurate. Robustness to model inaccuracy is achieved by using a hierarchical probabilistic data model and propagating higher-order statistics from the previous estimate (akin to Kalman filtering) in the sparse inference process. We demonstrate the properties of these algorithms on both simulated data as well as natural video sequences. Taken together, the algorithms presented in this paper represent the first strong performance analysis of dynamic filtering algorithms for time-varying sparse signals as well as state-of-the-art performance in this emerging application.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2016.2586745

1507.06145

Country: North America > United States (0.67)

Genre: Research Report (0.84)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Learning to classify with possible sensor failures

Xie, Tianpei, Nasrabadi, Nasser M., Hero, Alfred O.

arXiv.org Machine LearningFeb-22-2016

Large margin classifiers, such as the support vector machine (SVM) [1] and the maximum entropy discrimination (MED) classifier [2], have enjoyed great popularity in the signal processing and machine learning communities due to their broad applicability, robust performance, and the availability of fast software implementations. When the training data is representative of the test data, the performance of MED/SVM has theoretical guarantees that have been validated in practice [1], [3], [4]. Moreover, since the decision boundary of the MED/SVM is solely defined by a few support vectors, the algorithm can tolerate random feature distortions and perturbations. However, in many real applications, anomalous measurements are inherent to the data set due to strong environmental noise or possible sensor failures. Such anomalies arise in industrial process monitoring, video surveillance, tactical multi-modal sensing, and, more generally, any application that involves unattended sensors in difficult environments (Figure 1).

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/ICASSP.2014.6854029

1507.0454

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback