AITopics | Bayesian Learning

Collaborating Authors

Bayesian Learning

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Sequential Model-Based Ensemble Optimization

Lacoste, Alexandre, Larochelle, Hugo, Laviolette, François, Marchand, Mario

arXiv.org Machine LearningFeb-4-2014

One of the most tedious tasks in the application of machine learning is model selection, i.e. hyperparameter selection. Fortunately, recent progress has been made in the automation of this process, through the use of sequential model-based optimization (SMBO) methods. This can be used to optimize a cross-validation performance of a learning algorithm over the value of its hyperparameters. However, it is well known that ensembles of learned models almost consistently outperform a single model, even if properly selected. In this paper, we thus propose an extension of SMBO methods that automatically constructs such ensembles. This method builds on a recently proposed ensemble construction paradigm known as agnostic Bayesian learning. In experiments on 22 regression and 39 classification data sets, we confirm the success of this proposed approach, which is able to outperform model selection with SMBO.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

1402.0796

Country: North America > Canada > Quebec (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.30)

Add feedback

Discovering Latent Network Structure in Point Process Data

Linderman, Scott W., Adams, Ryan P.

arXiv.org Machine LearningFeb-4-2014

Networks play a central role in modern data analysis, enabling us to reason about systems by studying the relationships between their parts. Most often in network analysis, the edges are given. However, in many systems it is difficult or impossible to measure the network directly. Examples of latent networks include economic interactions linking financial instruments and patterns of reciprocity in gang violence. In these cases, we are limited to noisy observations of events associated with each node. To enable analysis of these implicit networks, we develop a probabilistic model that combines mutually-exciting point processes with random graph models. We show how the Poisson superposition principle enables an elegant auxiliary variable formulation and a fully-Bayesian, parallel inference algorithm. We evaluate this new model empirically on several datasets.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

1402.0914

Country: North America > United States > Illinois (0.15)

Genre: Research Report (0.50)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Banking & Finance (1.00)
(2 more...)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

A high-reproducibility and high-accuracy method for automated topic classification

Lancichinetti, Andrea, Sirer, M. Irmak, Wang, Jane X., Acuna, Daniel, Körding, Konrad, Amaral, Luís A. Nunes

arXiv.org Machine LearningFeb-3-2014

Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent search, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state-of-the-art in topic classification. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results which are not accurate in inferring the most suitable model parameters. Adapting approaches for community detection in networks, we propose a new algorithm which displays high-reproducibility and high-accuracy, and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure. Our algorithm promises to make "big data" text analysis systems more reliable.

likelihood, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

1402.0422

Country: North America > United States > Illinois (0.28)

Genre:

Research Report (1.00)
Workflow (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

DinTucker: Scaling up Gaussian process models on multidimensional arrays with billions of elements

Zhe, Shandian, Qi, Yuan, Park, Youngja, Molloy, Ian, Chari, Suresh

arXiv.org Machine LearningFeb-1-2014

Infinite Tucker Decomposition (InfTucker) and random function prior models, as nonparametric Bayesian models on infinite exchangeable arrays, are more powerful models than widely-used multilinear factorization methods including Tucker and PARAFAC decomposition, (partly) due to their capability of modeling nonlinear relationships between array elements. Despite their great predictive performance and sound theoretical foundations, they cannot handle massive data due to a prohibitively high training time. To overcome this limitation, we present Distributed Infinite Tucker (DINTUCKER), a large-scale nonlinear tensor decomposition algorithm on MAPREDUCE. While maintaining the predictive accuracy of InfTucker, it is scalable on massive data. DINTUCKER is based on a new hierarchical Bayesian model that enables local training of InfTucker on subarrays and information integration from all local training results. We use distributed stochastic gradient descent, coupled with variational inference, to train this model. We apply DINTUCKER to multidimensional arrays with billions of elements from applications in the "Read the Web" project (Carlson et al., 2010) and in information security and compare it with the state-of-the-art large-scale tensor decomposition method, GigaTensor. On both datasets, DINTUCKER achieves significantly higher prediction accuracy with less computational time.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

1311.2663

Genre: Research Report (0.50)

Industry:

Education (0.68)
Information Technology > Security & Privacy (0.53)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Marginal and simultaneous predictive classification using stratified graphical models

Nyman, Henrik, Xiong, Jie, Pensar, Johan, Corander, Jukka

arXiv.org Machine LearningJan-31-2014

Supervised classification is one of the most common tasks considered in machine learning and statistics (Bishop, 2007; Duda et al., 2000; Hastie et al., 2009; Ripley, 1996), with a wide variety of applications over practically all fields of science and engineering. Today, there exists a myriad of different classification methods, out of which those based on probabilistic models are widely accepted as the most sensible way to solve classification problems. Probabilistic methods are often themselves classified as either generative or discriminative, depending on whether one directly models the class posterior distribution (discriminative classifiers) or first the joint distribution of observed features (variables) conditional on class training data and then the posterior distribution of labels is obtained through Bayes' rule. There has been a debate around which of these approaches should be preferred in a particular application, see Ripley (1996), Hastie et al. (2009), Bishop (2007), and Pernkopf and Bilmes (2005), however, both classes of methods continue to be supported and further developed. One of the popular methods of probabilistic classification is based on encoding feature dependencies with Bayesian networks (Friedman et al., 1997). Such models can often represent data structures more faithfully than the naive Bayes classifier, which has been shown to yield dramatic improvements in classification accuracy in some cases. Numerous variants and extensions of the original framework introduced by Friedman et al. (1997) have been considered over the years, e.g.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1007/s11634-015-0199-5

1401.8078

Country: Europe > Finland (0.14)

Genre: Research Report (0.50)

Industry: Government (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Tempering by Subsampling

van de Meent, Jan-Willem, Paige, Brooks, Wood, Frank

arXiv.org Machine LearningJan-28-2014

In this paper we demonstrate that tempering Markov chain Monte Carlo samplers for Bayesian models by recursively subsampling observations without replacement can improve the performance of baseline samplers in terms of effective sample size per computation. We present two tempering by subsampling algorithms, subsampled parallel tempering and subsampled tempered transitions. We provide an asymptotic analysis of the computational cost of tempering by subsampling, verify that tempering by subsampling costs less than traditional tempering, and demonstrate both algorithms on Bayesian approaches to learning the mean of a high dimensional multivariate Normal and estimating Gaussian process hyperparameters.

artificial intelligence, machine learning, sampler, (17 more...)

arXiv.org Machine Learning

1401.7145

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation

Barron, Andrew, Roos, Teemu, Watanabe, Kazuho

arXiv.org Machine LearningJan-28-2014

The normalized maximized likelihood (NML) provides the minimax regret solution in universal data compression, gambling, and prediction, and it plays an essential role in the minimum description length (MDL) method of statistical modeling and estimation. Here we show that the normalized maximum likelihood has a Bayes-like representation as a mixture of the component models, even in finite samples, though the weights of linear combination may be both positive and negative. This representation addresses in part the relationship between MDL and Bayes modeling. This representation has the advantage of speeding the calculation of marginals and conditionals required for coding and prediction applications.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Machine Learning

1401.7116

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units

Montúfar, Guido F.

arXiv.org Machine LearningJan-28-2014

A deep belief network (DBN) (Hinton et al., 2006) is a layered stochastic network with undirected bipartite interactions between the units in the top two layers, and directed bipartite interactions between the units in all other subsequent pairs of layers, directed towards the bottom layer. The top two layers form a restricted Boltzmann machine (RBM) (Smolensky, 1986). The entire network defines a model of probability distributions on the states of the units in the bottom layer, the visible layer. When the number of units in every layer has the same order of magnitude, the network is called narrow . The depth refers to the number of layers. Deep network architectures are believed to play a key role in information processing of intelligent agents, see (Bengio, 2009) for an overview on this exciting topic. DBNs were the first deep architectures to be envisaged together with an efficient unsupervised training algorithm (Hinton et al., 2006).

artificial intelligence, machine learning, probability distribution, (17 more...)

arXiv.org Machine Learning

1303.7461

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)

Add feedback

Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs

He, Yangbo, Jia, Jinzhu, Yu, Bin

arXiv.org Machine LearningJan-27-2014

Graphical models are popular statistical tools which are used to represent dependent or causal complex systems. Statistically equivalent causal or directed graphical models are said to belong to a Markov equivalent class. It is of great interest to describe and understand the space of such classes. However, with currently known algorithms, sampling over such classes is only feasible for graphs with fewer than approximately 20 vertices. In this paper, we design reversible irreducible Markov chains on the space of Markov equivalent classes by proposing a perfect set of operators that determine the transitions of the Markov chain. The stationary distribution of a proposed Markov chain has a closed form and can be computed easily. Specifically, we construct a concrete perfect set of operators on sparse Markov equivalence classes by introducing appropriate conditions on each possible operator. Algorithms and their accelerated versions are provided to efficiently generate Markov chains and to explore properties of Markov equivalence classes of sparse directed acyclic graphs (DAGs) with thousands of vertices. We find experimentally that in most Markov equivalence classes of sparse DAGs, (1) most edges are directed, (2) most undirected subgraphs are small and (3) the number of these undirected subgraphs grows approximately linearly with the number of vertices. The article contains supplement arXiv:1303.0632, http://dx.doi.org/10.1214/13-AOS1125SUPP

artificial intelligence, machine learning, operator, (17 more...)

arXiv.org Machine Learning

doi: 10.1214/13-AOS1125

1209.586

Country: North America > United States > California (1.00)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.97)

Add feedback

Community Detection in Networks using Graph Distance

Bhattacharyya, Sharmodeep, Bickel, Peter J.

arXiv.org Machine LearningJan-24-2014

The study of networks has received increased attention recently not only from the social sciences and statistics but also from physicists, computer scientists and mathematicians. One of the principal problem in networks is community detection. Many algorithms have been proposed for community finding but most of them do not have have theoretical guarantee for sparse networks and networks close to the phase transition boundary proposed by physicists. There are some exceptions but all have some incomplete theoretical basis. Here we propose an algorithm based on the graph distance of vertices in the network. We give theoretical guarantees that our method works in identifying communities for block models and can be extended for degree-corrected block models and block models with the number of communities growing with number of vertices. Despite favorable simulation results, we are not yet able to conclude that our method is satisfactory for worst possible case. We illustrate on a network of political blogs, Facebook networks and some other networks.

data mining, machine learning, vertex, (18 more...)

arXiv.org Machine Learning

1401.3915

Country: North America > United States (0.92)

Genre: Research Report (0.40)

Industry: Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback