Bayesian Learning
Bethe Learning of Conditional Random Fields via MAP Decoding
Tang, Kui, Ruozzi, Nicholas, Belanger, David, Jebara, Tony
Many machine learning tasks can be formulated in terms of predicting structured outputs. In frameworks such as the structured support vector machine (SVM-Struct) and the structured per-ceptron, discriminative functions are learned by iteratively applying efficient maximum a posteri-ori (MAP) decoding. However, maximum likelihood estimation (MLE) of probabilistic models over these same structured spaces requires computing partition functions, which is generally intractable. This paper presents a method for learning discrete exponential family models using the Bethe approximation to the MLE. Remarkably, this problem also reduces to iterative (MAP) decoding. This connection emerges by combining the Bethe approximation with a Frank-Wolfe (FW) algorithm on a convex dual objective which circumvents the intractable partition function. The result is a new single loop algorithm MLE-Struct, which is substantially more efficient than previous double-loop methods for approximate maximum likelihood estimation. Our algorithm outperforms existing methods in experiments involving image segmentation, matching problems from vision, and a new dataset of university roommate assignments.
Heteroscedastic Treed Bayesian Optimisation
Assael, John-Alexander M., Wang, Ziyu, Shahriari, Bobak, de Freitas, Nando
Optimising black-box functions is important in many disciplines, such as tuning machine learning models, robotics, finance and mining exploration. Bayesian optimisation is a state-of-the-art technique for the global optimisation of black-box functions which are expensive to evaluate. At the core of this approach is a Gaussian process prior that captures our belief about the distribution over functions. However, in many cases a single Gaussian process is not flexible enough to capture non-stationarity in the objective function. Consequently, heteroscedasticity negatively affects performance of traditional Bayesian methods. In this paper, we propose a novel prior model with hierarchical parameter learning that tackles the problem of non-stationarity in Bayesian optimisation. Our results demonstrate substantial improvements in a wide range of applications, including automatic machine learning and mining exploration.
Optimality of Poisson processes intensity learning with Gaussian processes
Kirichenko, Alisa, van Zanten, Harry
In this paper we provide theoretical support for the so-called "Sigmoidal Gaussian Cox Process" approach to learning the intensity of an inhomogeneous Poisson process on a $d$-dimensional domain. This method was proposed by Adams, Murray and MacKay (ICML, 2009), who developed a tractable computational approach and showed in simulation and real data experiments that it can work quite satisfactorily. The results presented in the present paper provide theoretical underpinning of the method. In particular, we show how to tune the priors on the hyper parameters of the model in order for the procedure to automatically adapt to the degree of smoothness of the unknown intensity and to achieve optimal convergence rates.
Real-Time Optimal Selection of Multirobot Coalition Formation Algorithms Using Conceptual Clustering
Sen, Sayan Dev (Vanderbilt University) | Adams, Julie Ann (Vanderbilt University)
The presented framework is the The multirobot coalition formation problem seeks to intelligently first to leverage a conceptual clustering technique to partition partition a team of heterogeneous robots into any set of coalition formation algorithms in order to derive coalitions for a set of real-world tasks. Besides being N Pan optimal hierarchy classification tree, given any classification complete (Sandholm et al. 1999), the problem is also hard taxonomy. The results contribute to the state-ofthe-art to approximate (Service and Adams 2011a). Traditional approaches in multiagent systems by demonstrating the existence to solving the problem include a number of greedy of crucial patterns and intricate relationships among existing algorithms (Shehory and Kraus 1998; Vig and Adams coalition algorithms.
Modeling Ecological Integrity with Bayesian Belief Networks
Barrios, Juan M. (National Commission for Knowledge and Use of Biodiversity) | Sierra-Alcocer, Raรบl (National Commission for Knowledge and Use of Biodiversity) | Gonzรกlez-Salazar, Constantino (National Commission for Knowledge and Use of Biodiversity) | Mora, Franz E. (National Commission for Knowledge and Use of Biodiversity) | Munguรญa, Mariana (National Commission for Knowledge and Use of Biodiversity) | Pรฉrez-Maqueo, Octavio M. (National Commission for Knowledge and Use of Biodiversity) | Trejo, Isabel (National Commission for Knowledge and Use of Biodiversity)
Although the concept of ecological integrity is referred in many country legislations there is no consensus on how to formalize and implement it. One possible definition is as the capacity of an ecosystem to support and maintain a balanced, integrated, and adaptive community of organisms having a species composition, diversity, and functional organization comparable to that of a natural habitat of the region. Our objective is to model this interpretation of ecological integrity from a set of ecological measures that can be estimated from ecological inventory data.
Trajectory Analysis Based on Clustering and Casual Structures
Wong, Raymond K. (University of New South Wales) | Chu, Victor (University of New South Wales) | Ghanavati, Mojgan (University of New South Wales) | Hamzehei, Asso (University of New South Wales)
Causal structure discovery methods are investigated recently but none of them has taken possible time-varying structure into consideration. This paper uses a notion of causal time-varying dynamic Bayesian network (CTV-DBN) and define a causal boundary to govern cross-time information sharing. CTV-DBN is constructed by using asymmetric kernels to address sample scarcity and to adhere to causal principles; while maintaining good variance and bias trade-off. Upon satisfying causal Markov assumption, causal inference can be made based on manipulation rule. We explore trajectory data collected from taxis in Beijing which exhibit heterogeneous patterns, data sparseness and distribution skewness. Experiments show that by using casual structures and trajectory clustering, we can analyse the spatio-temporal behavior of the trajectory data.
Nonparametric Bayesian Learning of Other Agents' Policies in Interactive POMDPs
Panella, Alessandro (University of Illinois at Chicago) | Gmytrasiewicz, Piotr (University of Illinois at Chicago)
We consider an autonomous agent facing a partially observable, stochastic, multiagent environment where the unknown policies of other agents are represented as finite state controllers (FSCs). We show how an agent can (i) learn the FSCs of the other agents, and (ii) exploit these models during interactions. To separate the issues of off-line versus on-line learning we consider here an off-line two-phase approach. During the first phase the agent observes as the other player(s) are interacting with the environment (the observations may be imperfect and the learning agent is not taking part in the interaction.) The collected data is used to learn an ensemble of FSCs that explain the behavior of the other agent(s) using a Bayesian non-parametric (BNP) approach. We verify the quality of the learned models during the second phase by allowing the agent to compute its own optimal policy and interact with the observed agent. The optimal policy for the learning agent is obtained by solving an interactive POMDP in which the states are augmented by the other agent(s)' possible FSCs. The advantage of using the Bayesian nonparametric approach in the first phase is that the complexity (number of nodes) of the learned controllers is not bounded a priori. Our two-phase approach is preliminary and separates the learning using BNP from the complexities of learning on-line while the other agent may be modifying its policy (on-line approach is subject of our future work.) We describe our implementation and results in a multiagent Tiger domain. Our results show that learning improves the agent's performance, which increases with the amount of data collected during the learning phase.
Covariance Matrices and Influence Scores for Mean Field Variational Bayes
Giordano, Ryan, Broderick, Tamara
Mean field variational Bayes (MFVB) is a popular posterior approximation method due to its fast runtime on large-scale data sets. However, it is well known that a major failing of MFVB is that it underestimates the uncertainty of model variables (sometimes severely) and provides no information about model variable covariance. We develop a fast, general methodology for exponential families that augments MFVB to deliver accurate uncertainty estimates for model variables -- both for individual variables and coherently across variables. MFVB for exponential families defines a fixed-point equation in the means of the approximating posterior, and our approach yields a covariance estimate by perturbing this fixed point. Inspired by linear response theory, we call our method linear response variational Bayes (LRVB). We also show how LRVB can be used to quickly calculate a measure of the influence of individual data points on parameter point estimates. We demonstrate the accuracy and scalability of our method by learning Gaussian mixture models for both simulated and real data.
Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions
Kasarapu, Parthan, Allison, Lloyd
Mixture modelling involves explaining some observed evidence using a combination of probability distributions. The crux of the problem is the inference of an optimal number of mixture components and their corresponding parameters. This paper discusses unsupervised learning of mixture models using the Bayesian Minimum Message Length (MML) criterion. To demonstrate the effectiveness of search and inference of mixture parameters using the proposed approach, we select two key probability distributions, each handling fundamentally different types of data: the multivariate Gaussian distribution to address mixture modelling of data distributed in Euclidean space, and the multivariate von Mises-Fisher (vMF) distribution to address mixture modelling of directional data distributed on a unit hypersphere. The key contributions of this paper, in addition to the general search and inference methodology, include the derivation of MML expressions for encoding the data using multivariate Gaussian and von Mises-Fisher distributions, and the analytical derivation of the MML estimates of the parameters of the two distributions. Our approach is tested on simulated and real world data sets. For instance, we infer vMF mixtures that concisely explain experimentally determined three-dimensional protein conformations, providing an effective null model description of protein structures that is central to many inference problems in structural bioinformatics. The experimental results demonstrate that the performance of our proposed search and inference method along with the encoding schemes improve on the state of the art mixture modelling techniques.
1-Bit Matrix Completion under Exact Low-Rank Constraint
Bhaskar, Sonia, Javanmard, Adel
We consider the problem of noisy 1-bit matrix completion under an exact rank constraint on the true underlying matrix $M^*$. Instead of observing a subset of the noisy continuous-valued entries of a matrix $M^*$, we observe a subset of noisy 1-bit (or binary) measurements generated according to a probabilistic model. We consider constrained maximum likelihood estimation of $M^*$, under a constraint on the entry-wise infinity-norm of $M^*$ and an exact rank constraint. This is in contrast to previous work which has used convex relaxations for the rank. We provide an upper bound on the matrix estimation error under this model. Compared to the existing results, our bound has faster convergence rate with matrix dimensions when the fraction of revealed 1-bit observations is fixed, independent of the matrix dimensions. We also propose an iterative algorithm for solving our nonconvex optimization with a certificate of global optimality of the limiting point. This algorithm is based on low rank factorization of $M^*$. We validate the method on synthetic and real data with improved performance over existing methods.