Uncertainty
The Mahalanobis distance for functional data with applications to classification
Joseph, Esdras, Galeano, Pedro, Lillo, Rosa E.
This paper presents a general notion of Mahalanobis distance for functional data that extends the classical multivariate concept to situations where the observed data are points belonging to curves generated by a stochastic process. More precisely, a new semi-distance for functional observations that generalize the usual Mahalanobis distance for multivariate datasets is introduced. For that, the development uses a regularized square root inverse operator in Hilbert spaces. Some of the main characteristics of the functional Mahalanobis semi-distance are shown. Afterwards, new versions of several well known functional classification procedures are developed using the Mahalanobis distance for functional data as a measure of proximity between functional observations. The performance of several well known functional classification procedures are compared with those methods used in conjunction with the Mahalanobis distance for functional data, with positive results, through a Monte Carlo study and the analysis of two real data examples.
Identifying cancer subtypes in glioblastoma by combining genomic, transcriptomic and epigenomic data
Savage, Richard S., Ghahramani, Zoubin, Griffin, Jim E., Kirk, Paul, Wild, David L.
We present a nonparametric Bayesian method for disease subtype discovery in multi-dimensional cancer data. Our method can simultaneously analyse a wide range of data types, allowing for both agreement and disagreement between their underlying clustering structure. It includes feature selection and infers the most likely number of disease subtypes, given the data. We apply the method to 277 glioblastoma samples from The Cancer Genome Atlas, for which there are gene expression, copy number variation, methylation and microRNA data. We identify 8 distinct consensus subtypes and study their prognostic value for death, new tumour events, progression and recurrence. The consensus subtypes are prognostic of tumour recurrence (log-rank p-value of $3.6 \times 10^{-4}$ after correction for multiple hypothesis tests). This is driven principally by the methylation data (log-rank p-value of $2.0 \times 10^{-3}$) but the effect is strengthened by the other 3 data types, demonstrating the value of integrating multiple data types. Of particular note is a subtype of 47 patients characterised by very low levels of methylation. This subtype has very low rates of tumour recurrence and no new events in 10 years of follow up. We also identify a small gene expression subtype of 6 patients that shows particularly poor survival outcomes. Additionally, we note a consensus subtype that showly a highly distinctive data signature and suggest that it is therefore a biologically distinct subtype of glioblastoma. The code is available from https://sites.google.com/site/multipledatafusion/
Convergence of latent mixing measures in finite and infinite mixture models
This paper studies convergence behavior of latent mixing measures that arise in finite and infinite mixture models, using transportation distances (i.e., Wasserstein metrics). The relationship between Wasserstein distances on the space of mixing measures and f-divergence functionals such as Hellinger and Kullback-Leibler distances on the space of mixture distributions is investigated in detail using various identifiability conditions. Convergence in Wasserstein metrics for discrete measures implies convergence of individual atoms that provide support for the measures, thereby providing a natural interpretation of convergence of clusters in clustering applications where mixture models are typically employed. Convergence rates of posterior distributions for latent mixing measures are established, for both finite mixtures of multivariate distributions and infinite mixtures based on the Dirichlet process.
Predicting Behavior in Unstructured Bargaining with a Probability Distribution
In experimental tests of human behavior in unstructured bargaining games, typically many joint utility outcomes are found to occur, not just one. This suggests we predict the outcome of such a game as a probability distribution. This is in contrast to what is conventionally done (e.g, in the Nash bargaining solution), which is predict a single outcome. We show how to translate Nash's bargaining axioms to provide a distribution over outcomes rather than a single outcome. We then prove that a subset of those axioms forces the distribution over utility outcomes to be a power-law distribution. Unlike Nash's original result, our result holds even if the feasible set is finite. When the feasible set is convex and comprehensive, the mode of the power law distribution is the Harsanyi bargaining solution, and if we require symmetry it is the Nash bargaining solution. However, in general these modes of the joint utility distribution are not the experimentalist's Bayes-optimal predictions for the joint utility. Nor are the bargains corresponding to the modes of those joint utility distributions the modes of the distribution over bargains in general, since more than one bargain may result in the same joint utility. After introducing distributional bargaining solution concepts, we show how an external regulator can use them to optimally design an unstructured bargaining scenario. Throughout we demonstrate our analysis in computational experiments involving flight rerouting negotiations in the National Airspace System. We emphasize that while our results are formulated for unstructured bargaining, they can also be used to make predictions for noncooperative games where the modeler knows the utility functions of the players over possible outcomes of the game, but does not know the move spaces the players use to determine those outcomes.
The PAV algorithm optimizes binary proper scoring rules
Brummer, Niko, Preez, Johan du
There has been much recent interest in application of the pool-adjacent-violators (PAV) algorithm for the purpose of calibrating the probabilistic outputs of automatic pattern recognition and machine learning algorithms. Special cost functions, known as proper scoring rules form natural objective functions to judge the goodness of such calibration. We show that for binary pattern classifiers, the non-parametric optimization of calibration, subject to a monotonicity constraint, can be solved by PAV and that this solution is optimal for all regular binary proper scoring rules. This extends previous results which were limited to convex binary proper scoring rules. We further show that this result holds not only for calibration of probabilities, but also for calibration of log-likelihood-ratios, in which case optimality holds independently of the prior probabilities of the pattern classes.
ClusterCluster: Parallel Markov Chain Monte Carlo for Dirichlet Process Mixtures
Lovell, Dan, Malmaud, Jonathan, Adams, Ryan P., Mansinghka, Vikash K.
CLUSTERCLUSTER: PARALLEL MARKOV CHAIN MONTE CARLO FOR DIRICHLET PROCESS MIXTURES By Dan Lovell, Jonathan Malmaud, Ryan P. Adams and Vikash K. Mansinghka Massachusetts Institute of Technology and Harvard University The Dirichlet process (DP) is a fundamental mathematical tool for Bayesian nonparametric modeling, and is widely used in tasks such as density estimation, natural language processing, and time series modeling. Although MCMC inference methods for the DP often provide a gold standard in terms asymptotic accuracy, they can be computationally expensive and are not obviously parallelizable. We propose a reparameterization of the Dirichlet process that induces conditional independencies between the atoms that form the random measure. This conditional independence enables many of the Markov chain transition operators for DP inference to be simulated in parallel across multiple cores. Applied to mixture modeling, our approach enables the Dirichlet process to simultaneously learn clusters that describe the data and superclusters that define the granularity of parallelization. Unlike previous approaches, our technique does not require alteration of the model and leaves the true posterior distribution invariant. It also naturally lends itself to a distributed software implementation in terms of Map-Reduce, which we test in cluster configurations of over 50 machines and 100 cores.
Statistical Anomaly Detection for Train Fleets
Holst, Anders (Swedish Institute of Computer Science) | Bohlin, Markus (Swedish Institute of Computer Science) | Ekman, Jan (Swedish Institute of Computer Science) | Sellin, Ola (Bombardier Transportation) | Lindstrรถm, Bjรถrn (Addiva Consulting AB) | Larsen, Stefan (Addiva Eduro AB)
The Swedish Institute of Computer Science (SICS) has for several years developed methods for statistical anomaly detection based on a framework called Bayesian principal anomaly (Holst and Ekman 2011). In this article we describe a novel application Addtrack is a tool developed originally by Bombardier domain for the anomaly-detection method: condition Transportation for general analysis, monitoring, monitoring of trains (Holst, Ekman, and and visualization of train conditions and Larsen 2006). It is "intelligent" in statistical models. There are currently many the sense that analysis modules, such as the one popular anomaly-detection methods based on described in this article, can be used to preprocess nonparametric models (see, for example, Ahmed, and visualize data sets. Addtrack, including the anomalydetection model is very general since the parametric module described in this article, is forms of the distributions need not be currently deployed in Sweden, India, China, and known.
Sparsistent Estimation of Time-Varying Discrete Markov Random Fields
In recent years, we have witnessed fast advancement of data-acquisition techniques in many areas, including biological domains, engineering and social sciences. As a result, new statistical and machine learning techniques are needed to help us develop a better understanding of complexities underlying large, noisy data sets. Networks have been commonly used to abstract noisy data and provide an insight into regularities and dependencies between observed variables. For example, in a biological study, nodes of the network can represent genes in one organism and edges can represent associations or regulatory dependencies among genes. In a social domain, nodes of a network can represent actors and edges can represent interactions between actors. Recent popular techniques for modeling and exploring networks are based on the structure estimation in the probabilistic graphical models, specifically, Markov Random Fields (MRFs).
A Semiparametric Bayesian Extreme Value Model Using a Dirichlet Process Mixture of Gamma Densities
In recent years extreme value mixture models have been proposed as a combination of a distribution with a "bulk part" below threshold and a generalized Pareto distribution (GPD) in the tail. Different distributions have been proposed for modelling the "bulk part" where the threshold is a parameter to be estimated. The first approach which allow us a transition between the bulk and tail parts is provided by Frigessi, Haug & Harvard (2003). Frigessi et al. (2003) uses a Weibull distribution in the bulk part, a GPD for the tail and the location-scale Cauchy cdf in the transition function and the authors use maximum likelihood estimation. However in the Frigessi et al. (2003) approach maximum likelihood estimation in the bulk part could produce multiple modes and hence some identifiability problems. Behrens, Lopez & Gammerman (2004) and Carreu & Bengio (2009) consider Gamma and Normal distributions respectively in the bulk part.
On the definition of a confounder
VanderWeele, Tyler J., Shpitser, Ilya
The causal inference literature has provided a clear formal definition of confounding expressed in terms of counterfactual independence. The literature has not, however, come to any consensus on a formal definition of a confounder, as it has given priority to the concept of confounding over that of a confounder. We consider a number of candidate definitions arising from various more informal statements made in the literature. We consider the properties satisfied by each candidate definition, principally focusing on (i) whether under the candidate definition control for all "confounders" suffices to control for "confounding" and (ii) whether each confounder in some context helps eliminate or reduce confounding bias. Several of the candidate definitions do not have these two properties. Only one candidate definition of those considered satisfies both properties. We propose that a "confounder" be defined as a pre-exposure covariate C for which there exists a set of other covariates X such that effect of the exposure on the outcome is unconfounded conditional on (X,C) but such that for no proper subset of (X,C) is the effect of the exposure on the outcome unconfounded given the subset. We also provide a conditional analogue of the above definition; and we propose a variable that helps reduce bias but not eliminate bias be referred to as a "surrogate confounder." These definitions are closely related to those given by Robins and Morgenstern [Comput. Math. Appl. 14 (1987) 869-916]. The implications that hold among the various candidate definitions are discussed.