AITopics | Bayesian Learning

Collaborating Authors

Bayesian Learning

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Kaggle LSHTC4 Winning Solution

Puurula, Antti, Read, Jesse, Bifet, Albert

arXiv.org Artificial IntelligenceMay-9-2014

Our winning submission to the 2014 Kaggle competition for Large Scale Hierarchical Text Classification (LSHTC) consists mostly of an ensemble of sparse generative models extending Multinomial Naive Bayes. The base-classifiers consist of hierarchically smoothed models combining document, label, and hierarchy level Multinomials, with feature pre-processing using variants of TF-IDF and BM25. Additional diversification is introduced by different types of folds and random search optimization for different measures. The ensemble algorithm optimizes macroFscore by predicting the documents for each label, instead of the usual prediction of labels per document. Scores for documents are predicted by weighted voting of base-classifier outputs with a variant of Feature-Weighted Linear Stacking. The number of documents per label is chosen using label priors and thresholding of vote scores. This document describes the models and software used to build our solution. Reproducing the results for our solution can be done by running the scripts included in the Kaggle package. A package omitting precomputed result files is also distributed. All code is open source, released under GNU GPL 2.0, and GPL 3.0 for Weka and Meka dependencies.

classification, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

1405.0546

Country:

Oceania > New Zealand (0.14)
Asia > China (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

Reproducing kernel Hilbert space based estimation of systems of ordinary differential equations

González, Javier, Vujačić, Ivan, Wit, Ernst

arXiv.org Machine LearningMay-8-2014

Nonlinear systems of differential equations have attracted the interest in fields like system biology, ecology or biochemistry, due to their flexibility and their ability to describe dynamical systems. Despite the importance of such models in many branches of science they have not been the focus of systematic statistical analysis until recently. In this work we propose a general approach to estimate the parameters of systems of differential equations measured with noise. Our methodology is based on the maximization of the penalized likelihood where the system of differential equations is used as a penalty. To do so, we use a Reproducing Kernel Hilbert Space approach that allows us to formulate the estimation problem as an unconstrained numeric maximization problem easy to solve. The proposed method is tested with synthetically simulated data and it is used to estimate the unobserved transcription factor CdaR in Steptomyes coelicolor using gene expression data of the genes it regulates. Keywords: System of ordinary differential equations, differential operator, reproducing kernel Hilbert space, gene regulatory network 1. Introduction Despite the fact that differential equations are a common modelling tool within science and engineering, statistical methods for estimating such models have only received widespread attention during the last few years. The difficulty of solving differential equations in general has been a major stumbling block for efficient statistical procedures.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

1311.3576

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Factored Performance Functions with Structural Representation in Continuous Time Bayesian Networks

Sturlaugson, Liessman (Montana State University) | Sheppard, John W. (Montana State University)

AAAI ConferencesMay-7-2014

The continuous time Bayesian network (CTBN) is a probabilistic graphical model that enables reasoning about complex, interdependent, and continuous-time subsystems. The model uses nodes to denote subsystems and arcs to denote conditional dependence. This dependence manifests in how the dynamics of a subsystem change based on the current states of its parents in the network. While the original CTBN definition allows users to specify the dynamics of how the system evolves, users might also want to place value expressions over the dynamics of the model in the form of performance functions. We formalize these performance functions for the CTBN and show how they can be factored in the same way as the network, allowing what we argue is a more intuitive and explicit representation. For cases in which a performance function must involve multiple nodes, we show how to augment the structure of the CTBN to account for the performance interaction while maintaining the factorization of a single performance function for each node.

continuous time bayesian network, factored performance function, structural representation

AAAI Conferences

The Twenty-Seventh International Flairs Conference

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.60)

Add feedback

An Empirical Evaluation of Costs and Benefits of Simplifying Bayesian Networks by Removing Weak Arcs

Ratnapinda, Parot (University of Pittsburgh) | Druzdzel, Marek J. (University of Pittsburgh and Białystok University of Technology Białystok)

AAAI ConferencesMay-7-2014

We report the results of an empirical evaluation of structural simplification of Bayesian networks by removing weak arcs. We conduct a series of experiments on six networks built from real data sets selected from the UC Irvine Machine Learning Repository. We systematically remove arcs from the weakest to the strongest, relying on four measures of arc strength, and measure the classification accuracy of the resulting simplified models. Our results show that removing up to roughly 20 percent of the weakest arcs in a network has minimal effect on its classification accuracy. At the same time, structural simplification of networks leads to significant reduction of both the amount of memory taken by the clique tree and the amount of computation needed to perform inference.

empirical evaluation, removing weak arc, simplifying bayesian network, (1 more...)

AAAI Conferences

The Twenty-Seventh International Flairs Conference

Genre: Research Report > New Finding (0.53)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.60)

Add feedback

Why (and When and How) Contrastive Divergence Works

Fellows, Ian E

arXiv.org Machine LearningMay-3-2014

Contrastive divergence (CD) is a promising method of inference in high dimensional distributions with intractable normalizing constants, however, the theoretical foundations justifying its use are somewhat shaky. This document proposes a framework for understanding CD inference, how/when it works, and provides multiple justifications for the CD moment conditions, including framing them as a variational approximation. Algorithms for performing inference are discussed and are applied to social network data using an exponential-family random graph models (ERGM). The framework also provides guidance about how to construct MCMC kernels providing good CD inference, which turn out to be quite different from those used typically to provide fast global mixing.

artificial intelligence, machine learning, objective function, (18 more...)

arXiv.org Machine Learning

1405.0602

Genre: Research Report (0.84)

Industry: Information Technology (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Markov Blanket Ranking using Kernel-based Conditional Dependence Measures

Strobl, Eric V., Visweswaran, Shyam

arXiv.org Machine LearningMay-2-2014

Developing feature selection algorithms that move beyond a pure correlational to a more causal analysis of observational data is an important problem in the sciences. Several algorithms attempt to do so by discovering the Markov blanket of a target, but they all contain a forward selection step which variables must pass in order to be included in the conditioning set. As a result, these algorithms may not consider all possible conditional multivariate combinations. We improve on this limitation by proposing a backward elimination method that uses a kernel-based conditional dependence measure to identify the Markov blanket in a fully multivariate fashion. The algorithm is easy to implement and compares favorably to other methods on synthetic and real datasets.

artificial intelligence, machine learning, proposed-f and proposed-z, (14 more...)

arXiv.org Machine Learning

1402.0108

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Cover Tree Bayesian Reinforcement Learning

Tziortziotis, Nikolaos, Dimitrakakis, Christos, Blekas, Konstantinos

arXiv.org Machine LearningMay-2-2014

This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with a Gaussian process model, a linear model and simple least squares policy iteration.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1305.1809

Country:

Europe (0.93)
North America > United States (0.68)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Exchangeable Variable Models

Niepert, Mathias, Domingos, Pedro

arXiv.org Artificial IntelligenceMay-2-2014

A sequence of random variables is exchangeable if its joint distribution is invariant under variable permutations. We introduce exchangeable variable models (EVMs) as a novel class of probabilistic models whose basic building blocks are partially exchangeable sequences, a generalization of exchangeable sequences. We prove that a family of tractable EVMs is optimal under zero-one loss for a large class of functions, including parity and threshold functions, and strictly subsumes existing tractable independence-based model families. Extensive experiments show that EVMs outperform state of the art classifiers such as SVMs and probabilistic models which are solely based on independence assumptions.

artificial intelligence, machine learning, sequence, (18 more...)

arXiv.org Artificial Intelligence

1405.0501

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Add feedback

Auto-Encoding Variational Bayes

Kingma, Diederik P, Welling, Max

arXiv.org Machine LearningMay-1-2014

How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Machine Learning

1312.6114

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Comparative Evaluation of Link-Based Approaches for Candidate Ranking in Link-to-Wikipedia Systems

Fernandez Garcia, N., Arias Fisteus, J., Sanchez Fernandez, L.

Journal of Artificial Intelligence ResearchApr-30-2014

In recent years, the task of automatically linking pieces of text (anchors) mentioned in a document to Wikipedia articles that represent the meaning of these anchors has received extensive research attention. Typically, link-to-Wikipedia systems try to find a set of Wikipedia articles that are candidates to represent the meaning of the anchor and, later, rank these candidates to select the most appropriate one. In this ranking process the systems rely on context information obtained from the document where the anchor is mentioned and/or from Wikipedia. In this paper we center our attention in the use of Wikipedia links as context information. In particular, we offer a review of several candidate ranking approaches in the state-of-the-art that rely on Wikipedia link information. In addition, we provide a comparative empirical evaluation of the different approaches on five different corpora: the TAC 2010 corpus and four corpora built from actual Wikipedia articles and news items.

information, proceedings, query, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4129

AI Access Foundation

10877

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Galicia > Madrid (0.04)
(7 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(3 more...)

Add feedback