AITopics | Bayesian Learning

Collaborating Authors

Bayesian Learning

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Near-optimal Reinforcement Learning in Factored MDPs

Osband, Ian, Roy, Benjamin Van

Neural Information Processing SystemsDec-31-2014

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $\Omega(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action spaces. This implies $T = \Omega(SA)$ time to guarantee a near-optimal policy. In many settings of practical interest, due to the curse of dimensionality, $S$ and $A$ can be so enormous that this learning time is unacceptable. We establish that, if the system is known to be a \emph{factored} MDP, it is possible to achieve regret that scales polynomially in the number of \emph{parameters} encoding the factored MDP, which may be exponentially smaller than $S$ or $A$. We provide two algorithms that satisfy near-optimal regret bounds in this context: posterior sampling reinforcement learning (PSRL) and an upper confidence bound algorithm (UCRL-Factored).

Add feedback

Robust Bayesian Max-Margin Clustering

Chen, Changyou, Zhu, Jun, Zhang, Xinhua

Neural Information Processing SystemsDec-31-2014

We present max-margin Bayesian clustering (BMC), a general and robust framework that incorporates the max-margin criterion into Bayesian clustering models, as well as two concrete models of BMC to demonstrate its flexibility and effectiveness in dealing with different clustering tasks. The Dirichlet process max-margin Gaussian mixture is a nonparametric Bayesian clustering model that relaxes the underlying Gaussian assumption of Dirichlet process Gaussian mixtures by incorporating max-margin posterior constraints, and is able to infer the number of clusters from data. We further extend the ideas to present max-margin clustering topic model, which can learn the latent topic representation of each document while at the same time cluster documents in the max-margin fashion. Extensive experiments are performed on a number of real datasets, and the results indicate superior clustering performance of our methods compared to related baselines.

artificial intelligence, constraint, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia (0.29)
Oceania > Australia (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

(Almost) No Label No Cry

Patrini, Giorgio, Nock, Richard, Rivera, Paul, Caetano, Tiberio

Neural Information Processing SystemsDec-31-2014

In Learning with Label Proportions (LLP), the objective is to learn a supervised classifier when, instead of labels, only label proportions for bags of observations are known. This setting has broad practical relevance, in particular for privacy preserving data processing. We first show that the mean operator, a statistic which aggregates all labels, is minimally sufficient for the minimization of many proper scoring losses with linear (or kernelized) classifiers without using labels. We provide a fast learning algorithm that estimates the mean operator via a manifold regularizer with guaranteed approximation bounds. Then, we present an iterative learning algorithm that uses this as initialization. We ground this algorithm in Rademacher-style generalization bounds that fit the LLP setting, introducing a generalization of Rademacher complexity and a Label Proportion Complexity measure. This latter algorithm optimizes tractable bounds for the corresponding bag-empirical risk. Experiments are provided on fourteen domains, whose size ranges up to 300K observations. They display that our algorithms are scalable and tend to consistently outperform the state of the art in LLP. Moreover, in many cases, our algorithms compete with or are just percents of AUC away from the Oracle that learns knowing all labels. On the largest domains, half a dozen proportions can suffice, i.e. roughly 40K times less than the total number of labels.

artificial intelligence, classifier, machine learning, (14 more...)

Neural Information Processing Systems

Country: Oceania > Australia (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Semi-Separable Hamiltonian Monte Carlo for Inference in Bayesian Hierarchical Models

Zhang, Yichuan, Sutton, Charles

Neural Information Processing SystemsDec-30-2014, 18:00:00 GMT

Sampling from hierarchical Bayesian models is often difficult for MCMC methods, because of the strong correlations between the model parameters and the hyperparameters. Recent Riemannian manifold Hamiltonian Monte Carlo (RMHMC) methods have significant potential advantages in this setting, but are computationally expensive. We introduce a new RMHMC method, which we call semi-separable Hamiltonian Monte Carlo, which uses a specially designed mass matrix that allows the joint Hamiltonian over model parameters and hyperparameters to decompose into two simpler Hamiltonians. This structure is exploited by a new integrator which we call the alternating blockwise leapfrog algorithm. The resulting method can mix faster than simpler Gibbs sampling while being simpler and more efficient than previous instances of RMHMC.

artificial intelligence, machine learning, monte carlo, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Add feedback

A Bayesian encourages dropout

Maeda, Shin-ichi

arXiv.org Machine LearningDec-30-2014

Dropout is one of the key techniques to prevent the learning from overfitting. It is explained that dropout works as a kind of modified L2 regularization. Here, we shed light on the dropout from Bayesian standpoint. Bayesian interpretation enables us to optimize the dropout rate, which is beneficial for learning of weight parameters and prediction after learning. The experiment result also encourages the optimization of the dropout.

algorithm, dropout, dropout rate, (16 more...)

arXiv.org Machine Learning

1412.7003

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.95)

Add feedback

Tutorial on Structured Continuous-Time Markov Processes

Shelton, C. R., Ciardo, G.

Journal of Artificial Intelligence ResearchDec-23-2014

A continuous-time Markov process (CTMP) is a collection of variables indexed by a continuous quantity, time. It obeys the Markov property that the distribution over a future variable is independent of past variables given the state at the present time. We introduce continuous-time Markov process representations and algorithms for filtering, smoothing, expected sufficient statistics calculations, and model estimation, assuming no prior knowledge of continuous-time processes but some basic knowledge of probability and statistics. We begin by describing "flat" or unstructured Markov processes and then move to structured Markov processes (those arising from state spaces consisting of assignments to variables) including Kronecker, decision-diagram, and continuous-time Bayesian network representations. We provide the first connection between decision-diagrams and continuous-time Bayesian networks.

ev mdd, matrix, transition, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4415

AI Access Foundation

10921

Journal of Artificial Intelligence Research

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
North America > United States > Iowa (0.04)
North America > United States > California > Riverside County > Riverside (0.04)
(5 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.68)

Industry:

Information Technology (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Model Selection in High-Dimensional Misspecified Models

Basu, Pallavi, Feng, Yang, Lv, Jinchi

arXiv.org Machine LearningDec-23-2014

Model selection is indispensable to high-dimensional sparse modeling in selecting the best set of covariates among a sequence of candidate models. Most existing work assumes implicitly that the model is correctly specified or of fixed dimensions. Yet model misspecification and high dimensionality are common in real applications. In this paper, we investigate two classical Kullback-Leibler divergence and Bayesian principles of model selection in the setting of high-dimensional misspecified models. Asymptotic expansions of these principles reveal that the effect of model misspecification is crucial and should be taken into account, leading to the generalized AIC and generalized BIC in high dimensions. With a natural choice of prior probabilities, we suggest the generalized BIC with prior probability which involves a logarithmic factor of the dimensionality in penalizing model complexity. We further establish the consistency of the covariance contrast matrix estimator in a general setting. Our results and new method are supported by numerical studies.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1412.7468

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

Locally Weighted Learning for Naive Bayes Classifier

Li, Kim-Hung, Li, Cheuk Ting

arXiv.org Machine LearningDec-21-2014

As a consequence of the strong and usually violated conditional independence assumption (CIA) of naive Bayes (NB) classifier, the performance of NB becomes less and less favorable compared to sophisticated classifiers when the sample size increases. We learn from this phenomenon that when the size of the training data is large, we should either relax the assumption or apply NB to a "reduced" data set, say for example use NB as a local model. The latter approach trades the ignored information for the robustness to the model assumption. In this paper, we consider using NB as a model for locally weighted data. A special weighting function is designed so that if CIA holds for the unweighted data, it also holds for the weighted data. The new method is intuitive and capable of handling class imbalance. It is theoretically more sound than the locally weighted learners of naive Bayes that base classification only on the $k$ nearest neighbors. Empirical study shows that the new method with appropriate choice of parameter outperforms seven existing classifiers of similar nature.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Machine Learning

1412.6741

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Parameter estimation in spherical symmetry groups

Chen, Yu-Hui, Wei, Dennis, Newstadt, Gregory, DeGraef, Marc, Simmons, Jeffrey, Hero, Alfred

arXiv.org Machine LearningDec-21-2014

This paper considers statistical estimation problems where the probability distribution of the observed random variable is invariant with respect to actions of a finite topological group. It is shown that any such distribution must satisfy a restricted finite mixture representation. When specialized to the case of distributions over the sphere that are invariant to the actions of a finite spherical symmetry group $\mathcal G$, a group-invariant extension of the Von Mises Fisher (VMF) distribution is obtained. The $\mathcal G$-invariant VMF is parameterized by location and scale parameters that specify the distribution's mean orientation and its concentration about the mean, respectively. Using the restricted finite mixture representation these parameters can be estimated using an Expectation Maximization (EM) maximum likelihood (ML) estimation algorithm. This is illustrated for the problem of mean crystal orientation estimation under the spherically symmetric group associated with the crystal form, e.g., cubic or octahedral or hexahedral. Simulations and experiments establish the advantages of the extended VMF EM-ML estimator for data acquired by Electron Backscatter Diffraction (EBSD) microscopy of a polycrystalline Nickel alloy sample.

artificial intelligence, machine learning, orientation, (17 more...)

arXiv.org Machine Learning

doi: 10.1109/LSP.2014.2387206

1411.254

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.40)

Industry: Materials (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

Cauchy Principal Component Analysis

Xie, Pengtao, Xing, Eric

arXiv.org Machine LearningDec-19-2014

Principal Component Analysis (PCA) has wide applications in machine learning, text mining and computer vision. Classical PCA based on a Gaussian noise model is fragile to noise of large magnitude. Laplace noise assumption based PCA methods cannot deal with dense noise effectively. In this paper, we propose Cauchy Principal Component Analysis (Cauchy PCA), a very simple yet effective PCA method which is robust to various types of noise. We utilize Cauchy distribution to model noise and derive Cauchy PCA under the maximum likelihood estimation (MLE) framework with low rank constraint. Our method can robustly estimate the low rank matrix regardless of whether noise is large or small, dense or sparse. We analyze the robustness of Cauchy PCA from a robust statistics view and present an efficient singular value projection optimization method. Experimental results on both simulated data and real applications demonstrate the robustness of Cauchy PCA to various noise patterns.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

1412.6506

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.82)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)

Add feedback