Goto

Collaborating Authors

 Statistical Learning


A Sufficient Statistics Construction of Bayesian Nonparametric Exponential Family Conjugate Models

arXiv.org Machine Learning

Conjugate pairs of distributions over infinite dimensional spaces are prominent in statistical learning theory, particularly due to the widespread adoption of Bayesian nonparametric methodologies for a host of models and applications. Much of the existing literature in the learning community focuses on processes possessing some form of computationally tractable conjugacy as is the case for the beta and gamma processes (and, via normalization, the Dirichlet process). For these processes, proofs of conjugacy and requisite derivation of explicit computational formulae for posterior density parameters are idiosyncratic to the stochastic process in question. As such, Bayesian Nonparametric models are currently available for a limited number of conjugate pairs, e.g. the Dirichlet-multinomial and beta-Bernoulli process pairs. In each of these above cases the likelihood process belongs to the class of discrete exponential family distributions. The exclusion of continuous likelihood distributions from the known cases of Bayesian Nonparametric Conjugate models stands as a disparity in the researcher's toolbox. In this paper we first address the problem of obtaining a general construction of prior distributions over infinite dimensional spaces possessing distributional properties amenable to conjugacy. Second, we bridge the divide between the discrete and continuous likelihoods by illustrating a canonical construction for stochastic processes whose Levy measure densities are from positive exponential families, and then demonstrate that these processes in fact form the prior, likelihood, and posterior in a conjugate family. Our canonical construction subsumes known computational formulae for posterior density parameters in the cases where the likelihood is from a discrete distribution belonging to an exponential family.


Temporal Multinomial Mixture for Instance-Oriented Evolutionary Clustering

arXiv.org Machine Learning

Evolutionary clustering aims at capturing the temporal evolution of clusters. This issue is particularly important in the context of social media data that are naturally temporally driven. In this paper, we propose a new probabilistic model-based evolutionary clustering technique. The Temporal Multinomial Mixture (TMM) is an extension of classical mixture model that optimizes feature co-occurrences in the trade-off with temporal smoothness. Our model is evaluated for two recent case studies on opinion aggregation over time. We compare four different probabilistic clustering models and we show the superiority of our proposal in the task of instance-oriented clustering.


On Clustering Time Series Using Euclidean Distance and Pearson Correlation

arXiv.org Machine Learning

For time series comparisons, it has often been observed that z-score normalized Euclidean distances far outperform the unnormalized variant. In this paper we show that a z-score normalized, squared Euclidean Distance is, in fact, equal to a distance based on Pearson Correlation. This has profound impact on many distance-based classification or clustering methods. In addition to this theoretically sound result we also show that the often used k-Means algorithm formally needs a mod ification to keep the interpretation as Pearson correlation strictly valid. Experimental results demonstrate that in many cases the standard k-Means algorithm generally produces the same results.


Directional Decision Lists

arXiv.org Machine Learning

In this paper we introduce a novel family of decision lists consisting of highly interpretable models which can be learned efficiently in a greedy manner. The defining property is that all rules are oriented in the same direction. Particular examples of this family are decision lists with monotonically decreasing (or increasing) probabilities. On simulated data we empirically confirm that the proposed model family is easier to train than general decision lists. We exemplify the practical usability of our approach by identifying problem symptoms in a manufacturing process.


Bayesian Optimization in a Billion Dimensions via Random Embeddings

arXiv.org Machine Learning

Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have identified its scaling to high-dimensions as one of the holy grails of the field. In this paper, we introduce a novel random embedding idea to attack this problem. The resulting Random EMbedding Bayesian Optimization (REMBO) algorithm is very simple, has important invariance properties, and applies to domains with both categorical and continuous variables. We present a thorough theoretical analysis of REMBO. Empirical results confirm that REMBO can effectively solve problems with billions of dimensions, provided the intrinsic dimensionality is low. They also show that REMBO achieves state-of-the-art performance in optimizing the 47 discrete parameters of a popular mixed integer linear programming solver.


Nonparametric semi-supervised learning of class proportions

arXiv.org Machine Learning

The problem of developing binary classifiers from positive and unlabeled data is often encountered in machine learning. A common requirement in this setting is to approximate posterior probabilities of positive and negative classes for a previously unseen data point. This problem can be decomposed into two steps: (i) the development of accurate predictors that discriminate between positive and unlabeled data, and (ii) the accurate estimation of the prior probabilities of positive and negative examples. In this work we primarily focus on the latter subproblem. We study nonparametric class prior estimation and formulate this problem as an estimation of mixing proportions in two-component mixture models, given a sample from one of the components and another sample from the mixture itself. We show that estimation of mixing proportions is generally ill-defined and propose a canonical form to obtain identifiability while maintaining the flexibility to model any distribution. We use insights from this theory to elucidate the optimization surface of the class priors and propose an algorithm for estimating them. To address the problems of high-dimensional density estimation, we provide practical transformations to low-dimensional spaces that preserve class priors. Finally, we demonstrate the efficacy of our method on univariate and multivariate data.


Consistent Biclustering

arXiv.org Machine Learning

Biclustering, the process of simultaneously clustering the rows and columns of a data matrix, is a popular and effective tool for finding structure in a high-dimensional dataset. Many biclustering procedures appear to work well in practice, but most do not have associated consistency guarantees. To address this shortcoming, we propose a new biclustering procedure based on profile likelihood. The procedure applies to a broad range of data modalities, including binary, count, and continuous observations. We prove that the procedure recovers the true row and column classes when the dimensions of the data matrix tend to infinity, even if the functional form of the data distribution is misspecified. The procedure requires computing a combinatorial search, which can be expensive in practice. Rather than performing this search directly, we propose a new heuristic optimization procedure based on the Kernighan-Lin heuristic, which has nice computational properties and performs well in simulations. We demonstrate our procedure with applications to congressional voting records, and microarray analysis.


DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

arXiv.org Machine Learning

We present Dual-Loco, a communicationefficient algorithm for distributed statistical estimation. Dual-Loco assumes that the data is distributed across workers according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependencies between features available to different workers. We show that Dual-Loco has bounded approximation error which only depends weakly on the number of workers. We compare Dual-Loco against a state-of-theart distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy. In particular, Dual-Loco allows for fast cross validation as only part of the algorithm depends on the regularization parameter.


Numerical Coding of Nominal Data

arXiv.org Machine Learning

In this paper, a novel approach for coding nominal data is proposed. For the given nominal data, a rank in a form of complex number is assigned. The proposed method does not lose any information about the attribute and brings other properties previously unknown. The approach based on these knew properties can been used for classification. The analyzed example shows that classification with the use of coded nominal data or both numerical as well as coded nominal data is more effective than the classification, which uses only numerical data.


State Space representation of non-stationary Gaussian Processes

arXiv.org Machine Learning

The state space (SS) representation of Gaussian processes (GP) has recently gained a lot of interest. The main reason is that it allows to compute GPs based inferences in O(n), where $n$ is the number of observations. This implementation makes GPs suitable for Big Data. For this reason, it is important to provide a SS representation of the most important kernels used in machine learning. The aim of this paper is to show how to exploit the transient behaviour of SS models to map non-stationary kernels to SS models.