Bayesian Learning
Graph Estimation From Multi-attribute Data
Kolar, Mladen, Liu, Han, Xing, Eric P.
Many real world network problems often concern multivariate nodal attributes such as image, textual, and multi-view feature vectors on nodes, rather than simple univariate nodal attributes. The existing graph estimation methods built on Gaussian graphical models and covariance selection algorithms can not handle such data, neither can the theories developed around such methods be directly applied. In this paper, we propose a new principled framework for estimating graphs from multi-attribute data. Instead of estimating the partial correlation as in current literature, our method estimates the partial canonical correlations that naturally accommodate complex nodal features. Computationally, we provide an efficient algorithm which utilizes the multi-attribute structure. Theoretically, we provide sufficient conditions which guarantee consistent graph recovery. Extensive simulation studies demonstrate performance of our method under various conditions. Furthermore, we provide illustrative applications to uncovering gene regulatory networks from gene and protein profiles, and uncovering brain connectivity graph from functional magnetic resonance imaging data.
The Mahalanobis distance for functional data with applications to classification
Joseph, Esdras, Galeano, Pedro, Lillo, Rosa E.
This paper presents a general notion of Mahalanobis distance for functional data that extends the classical multivariate concept to situations where the observed data are points belonging to curves generated by a stochastic process. More precisely, a new semi-distance for functional observations that generalize the usual Mahalanobis distance for multivariate datasets is introduced. For that, the development uses a regularized square root inverse operator in Hilbert spaces. Some of the main characteristics of the functional Mahalanobis semi-distance are shown. Afterwards, new versions of several well known functional classification procedures are developed using the Mahalanobis distance for functional data as a measure of proximity between functional observations. The performance of several well known functional classification procedures are compared with those methods used in conjunction with the Mahalanobis distance for functional data, with positive results, through a Monte Carlo study and the analysis of two real data examples.
Identifying cancer subtypes in glioblastoma by combining genomic, transcriptomic and epigenomic data
Savage, Richard S., Ghahramani, Zoubin, Griffin, Jim E., Kirk, Paul, Wild, David L.
We present a nonparametric Bayesian method for disease subtype discovery in multi-dimensional cancer data. Our method can simultaneously analyse a wide range of data types, allowing for both agreement and disagreement between their underlying clustering structure. It includes feature selection and infers the most likely number of disease subtypes, given the data. We apply the method to 277 glioblastoma samples from The Cancer Genome Atlas, for which there are gene expression, copy number variation, methylation and microRNA data. We identify 8 distinct consensus subtypes and study their prognostic value for death, new tumour events, progression and recurrence. The consensus subtypes are prognostic of tumour recurrence (log-rank p-value of $3.6 \times 10^{-4}$ after correction for multiple hypothesis tests). This is driven principally by the methylation data (log-rank p-value of $2.0 \times 10^{-3}$) but the effect is strengthened by the other 3 data types, demonstrating the value of integrating multiple data types. Of particular note is a subtype of 47 patients characterised by very low levels of methylation. This subtype has very low rates of tumour recurrence and no new events in 10 years of follow up. We also identify a small gene expression subtype of 6 patients that shows particularly poor survival outcomes. Additionally, we note a consensus subtype that showly a highly distinctive data signature and suggest that it is therefore a biologically distinct subtype of glioblastoma. The code is available from https://sites.google.com/site/multipledatafusion/
The BOSARIS Toolkit: Theory, Algorithms and Code for Surviving the New DCF
Brümmer, Niko, de Villiers, Edward
The change of two orders of magnitude in the 'new DCF' of NIST's SRE'10, relative to the 'old DCF' evaluation criterion, posed a difficult challenge for participants and evaluator alike. Initially, participants were at a loss as to how to calibrate their systems, while the evaluator underestimated the required number of evaluation trials. After the fact, it is now obvious that both calibration and evaluation require very large sets of trials. This poses the challenges of (i) how to decide what number of trials is enough, and (ii) how to process such large data sets with reasonable memory and CPU requirements. After SRE'10, at the BOSARIS Workshop, we built solutions to these problems into the freely available BOSARIS Toolkit. This paper explains the principles and algorithms behind this toolkit. The main contributions of the toolkit are: 1. The Normalized Bayes Error-Rate Plot, which analyses likelihood- ratio calibration over a wide range of DCF operating points. These plots also help in judging the adequacy of the sizes of calibration and evaluation databases. 2. Efficient algorithms to compute DCF and minDCF for large score files, over the range of operating points required by these plots. 3. A new score file format, which facilitates working with very large trial lists. 4. A faster logistic regression optimizer for fusion and calibration. 5. A principled way to define EER (equal error rate), which is of practical interest when the absolute error count is small.
Predicting Behavior in Unstructured Bargaining with a Probability Distribution
In experimental tests of human behavior in unstructured bargaining games, typically many joint utility outcomes are found to occur, not just one. This suggests we predict the outcome of such a game as a probability distribution. This is in contrast to what is conventionally done (e.g, in the Nash bargaining solution), which is predict a single outcome. We show how to translate Nash's bargaining axioms to provide a distribution over outcomes rather than a single outcome. We then prove that a subset of those axioms forces the distribution over utility outcomes to be a power-law distribution. Unlike Nash's original result, our result holds even if the feasible set is finite. When the feasible set is convex and comprehensive, the mode of the power law distribution is the Harsanyi bargaining solution, and if we require symmetry it is the Nash bargaining solution. However, in general these modes of the joint utility distribution are not the experimentalist's Bayes-optimal predictions for the joint utility. Nor are the bargains corresponding to the modes of those joint utility distributions the modes of the distribution over bargains in general, since more than one bargain may result in the same joint utility. After introducing distributional bargaining solution concepts, we show how an external regulator can use them to optimally design an unstructured bargaining scenario. Throughout we demonstrate our analysis in computational experiments involving flight rerouting negotiations in the National Airspace System. We emphasize that while our results are formulated for unstructured bargaining, they can also be used to make predictions for noncooperative games where the modeler knows the utility functions of the players over possible outcomes of the game, but does not know the move spaces the players use to determine those outcomes.
The PAV algorithm optimizes binary proper scoring rules
Brummer, Niko, Preez, Johan du
There has been much recent interest in application of the pool-adjacent-violators (PAV) algorithm for the purpose of calibrating the probabilistic outputs of automatic pattern recognition and machine learning algorithms. Special cost functions, known as proper scoring rules form natural objective functions to judge the goodness of such calibration. We show that for binary pattern classifiers, the non-parametric optimization of calibration, subject to a monotonicity constraint, can be solved by PAV and that this solution is optimal for all regular binary proper scoring rules. This extends previous results which were limited to convex binary proper scoring rules. We further show that this result holds not only for calibration of probabilities, but also for calibration of log-likelihood-ratios, in which case optimality holds independently of the prior probabilities of the pattern classes.
ClusterCluster: Parallel Markov Chain Monte Carlo for Dirichlet Process Mixtures
Lovell, Dan, Malmaud, Jonathan, Adams, Ryan P., Mansinghka, Vikash K.
CLUSTERCLUSTER: PARALLEL MARKOV CHAIN MONTE CARLO FOR DIRICHLET PROCESS MIXTURES By Dan Lovell, Jonathan Malmaud, Ryan P. Adams and Vikash K. Mansinghka Massachusetts Institute of Technology and Harvard University The Dirichlet process (DP) is a fundamental mathematical tool for Bayesian nonparametric modeling, and is widely used in tasks such as density estimation, natural language processing, and time series modeling. Although MCMC inference methods for the DP often provide a gold standard in terms asymptotic accuracy, they can be computationally expensive and are not obviously parallelizable. We propose a reparameterization of the Dirichlet process that induces conditional independencies between the atoms that form the random measure. This conditional independence enables many of the Markov chain transition operators for DP inference to be simulated in parallel across multiple cores. Applied to mixture modeling, our approach enables the Dirichlet process to simultaneously learn clusters that describe the data and superclusters that define the granularity of parallelization. Unlike previous approaches, our technique does not require alteration of the model and leaves the true posterior distribution invariant. It also naturally lends itself to a distributed software implementation in terms of Map-Reduce, which we test in cluster configurations of over 50 machines and 100 cores.
Statistical Anomaly Detection for Train Fleets
Holst, Anders (Swedish Institute of Computer Science) | Bohlin, Markus (Swedish Institute of Computer Science) | Ekman, Jan (Swedish Institute of Computer Science) | Sellin, Ola (Bombardier Transportation) | Lindström, Björn (Addiva Consulting AB) | Larsen, Stefan (Addiva Eduro AB)
The Swedish Institute of Computer Science (SICS) has for several years developed methods for statistical anomaly detection based on a framework called Bayesian principal anomaly (Holst and Ekman 2011). In this article we describe a novel application Addtrack is a tool developed originally by Bombardier domain for the anomaly-detection method: condition Transportation for general analysis, monitoring, monitoring of trains (Holst, Ekman, and and visualization of train conditions and Larsen 2006). It is "intelligent" in statistical models. There are currently many the sense that analysis modules, such as the one popular anomaly-detection methods based on described in this article, can be used to preprocess nonparametric models (see, for example, Ahmed, and visualize data sets. Addtrack, including the anomalydetection model is very general since the parametric module described in this article, is forms of the distributions need not be currently deployed in Sweden, India, China, and known.
Sparsistent Estimation of Time-Varying Discrete Markov Random Fields
In recent years, we have witnessed fast advancement of data-acquisition techniques in many areas, including biological domains, engineering and social sciences. As a result, new statistical and machine learning techniques are needed to help us develop a better understanding of complexities underlying large, noisy data sets. Networks have been commonly used to abstract noisy data and provide an insight into regularities and dependencies between observed variables. For example, in a biological study, nodes of the network can represent genes in one organism and edges can represent associations or regulatory dependencies among genes. In a social domain, nodes of a network can represent actors and edges can represent interactions between actors. Recent popular techniques for modeling and exploring networks are based on the structure estimation in the probabilistic graphical models, specifically, Markov Random Fields (MRFs).
A Semiparametric Bayesian Extreme Value Model Using a Dirichlet Process Mixture of Gamma Densities
In recent years extreme value mixture models have been proposed as a combination of a distribution with a "bulk part" below threshold and a generalized Pareto distribution (GPD) in the tail. Different distributions have been proposed for modelling the "bulk part" where the threshold is a parameter to be estimated. The first approach which allow us a transition between the bulk and tail parts is provided by Frigessi, Haug & Harvard (2003). Frigessi et al. (2003) uses a Weibull distribution in the bulk part, a GPD for the tail and the location-scale Cauchy cdf in the transition function and the authors use maximum likelihood estimation. However in the Frigessi et al. (2003) approach maximum likelihood estimation in the bulk part could produce multiple modes and hence some identifiability problems. Behrens, Lopez & Gammerman (2004) and Carreu & Bengio (2009) consider Gamma and Normal distributions respectively in the bulk part.