AITopics

Conjugate pairs of distributions over infinite dimensional spaces are prominent in statistical learning theory, particularly due to the widespread adoption of Bayesian nonparametric methodologies for a host of models and applications. Much of the existing literature in the learning community focuses on processes possessing some form of computationally tractable conjugacy as is the case for the beta and gamma processes (and, via normalization, the Dirichlet process). For these processes, proofs of conjugacy and requisite derivation of explicit computational formulae for posterior density parameters are idiosyncratic to the stochastic process in question. As such, Bayesian Nonparametric models are currently available for a limited number of conjugate pairs, e.g. the Dirichlet-multinomial and beta-Bernoulli process pairs. In each of these above cases the likelihood process belongs to the class of discrete exponential family distributions. The exclusion of continuous likelihood distributions from the known cases of Bayesian Nonparametric Conjugate models stands as a disparity in the researcher's toolbox. In this paper we first address the problem of obtaining a general construction of prior distributions over infinite dimensional spaces possessing distributional properties amenable to conjugacy. Second, we bridge the divide between the discrete and continuous likelihoods by illustrating a canonical construction for stochastic processes whose Levy measure densities are from positive exponential families, and then demonstrate that these processes in fact form the prior, likelihood, and posterior in a conjugate family. Our canonical construction subsumes known computational formulae for posterior density parameters in the cases where the likelihood is from a discrete distribution belonging to an exponential family.

artificial intelligence, exponential family, machine learning, (18 more...)

1601.02257

Country:

Europe (0.92)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Kim, Young-Min, Velcin, Julien, Bonnevay, Stéphane, Rizoiu, Marian-Andrei

Temporal Multinomial Mixture for Instance-Oriented Evolutionary Clustering

Evolutionary clustering aims at capturing the temporal evolution of clusters. This issue is particularly important in the context of social media data that are naturally temporally driven. In this paper, we propose a new probabilistic model-based evolutionary clustering technique. The Temporal Multinomial Mixture (TMM) is an extension of classical mixture model that optimizes feature co-occurrences in the trade-off with temporal smoothness. Our model is evaluated for two recent case studies on opinion aggregation over time. We compare four different probabilistic clustering models and we show the superiority of our proposal in the task of instance-oriented clustering.

artificial intelligence, machine learning, natural language, (18 more...)

doi: 10.1007/978-3-319-16354-3_66

1601.023

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Berthold, Michael R., Höppner, Frank

On Clustering Time Series Using Euclidean Distance and Pearson Correlation

For time series comparisons, it has often been observed that z-score normalized Euclidean distances far outperform the unnormalized variant. In this paper we show that a z-score normalized, squared Euclidean Distance is, in fact, equal to a distance based on Pearson Correlation. This has profound impact on many distance-based classification or clustering methods. In addition to this theoretically sound result we also show that the often used k-Means algorithm formally needs a mod ification to keep the interpretation as Pearson correlation strictly valid. Experimental results demonstrate that in many cases the standard k-Means algorithm generally produces the same results.

artificial intelligence, euclidean distance, machine learning, (12 more...)

1601.02213

Country: Europe > Germany (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.85)

Goessling, Marc, Kang, Shan

Directional Decision Lists

In this paper we introduce a novel family of decision lists consisting of highly interpretable models which can be learned efficiently in a greedy manner. The defining property is that all rules are oriented in the same direction. Particular examples of this family are decision lists with monotonically decreasing (or increasing) probabilities. On simulated data we empirically confirm that the proposed model family is easier to train than general decision lists. We exemplify the practical usability of our approach by identifying problem symptoms in a manufacturing process.

artificial intelligence, decision list, machine learning, (18 more...)

1508.07643

Country: North America > United States (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
(2 more...)

Bayesian Optimization in a Billion Dimensions via Random Embeddings

Wang, Ziyu, Hutter, Frank, Zoghi, Masrour, Matheson, David, de Freitas, Nando

Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have identified its scaling to high-dimensions as one of the holy grails of the field. In this paper, we introduce a novel random embedding idea to attack this problem. The resulting Random EMbedding Bayesian Optimization (REMBO) algorithm is very simple, has important invariance properties, and applies to domains with both categorical and continuous variables. We present a thorough theoretical analysis of REMBO. Empirical results confirm that REMBO can effectively solve problems with billions of dimensions, provided the intrinsic dimensionality is low. They also show that REMBO achieves state-of-the-art performance in optimizing the 47 discrete parameters of a popular mixed integer linear programming solver.

artificial intelligence, machine learning, optimization, (19 more...)

1301.1942

Country:

North America > Canada (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Jain, Shantanu, White, Martha, Trosset, Michael W., Radivojac, Predrag

Nonparametric semi-supervised learning of class proportions

The problem of developing binary classifiers from positive and unlabeled data is often encountered in machine learning. A common requirement in this setting is to approximate posterior probabilities of positive and negative classes for a previously unseen data point. This problem can be decomposed into two steps: (i) the development of accurate predictors that discriminate between positive and unlabeled data, and (ii) the accurate estimation of the prior probabilities of positive and negative examples. In this work we primarily focus on the latter subproblem. We study nonparametric class prior estimation and formulate this problem as an estimation of mixing proportions in two-component mixture models, given a sample from one of the components and another sample from the mixture itself. We show that estimation of mixing proportions is generally ill-defined and propose a canonical form to obtain identifiability while maintaining the flexibility to model any distribution. We use insights from this theory to elucidate the optimization surface of the class priors and propose an algorithm for estimating them. To address the problems of high-dimensional density estimation, we provide practical transformations to low-dimensional spaces that preserve class priors. Finally, we demonstrate the efficacy of our method on univariate and multivariate data.

artificial intelligence, machine learning, proportion, (19 more...)

1601.01944

Country: North America > United States > Indiana (0.14)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)
(2 more...)

Flynn, Cheryl J., Perry, Patrick O.

Consistent Biclustering

Biclustering, the process of simultaneously clustering the rows and columns of a data matrix, is a popular and effective tool for finding structure in a high-dimensional dataset. Many biclustering procedures appear to work well in practice, but most do not have associated consistency guarantees. To address this shortcoming, we propose a new biclustering procedure based on profile likelihood. The procedure applies to a broad range of data modalities, including binary, count, and continuous observations. We prove that the procedure recovers the true row and column classes when the dimensions of the data matrix tend to infinity, even if the functional form of the data distribution is misspecified. The procedure requires computing a combinatorial search, which can be expensive in practice. Rather than performing this search directly, we propose a new heuristic optimization procedure based on the Kernighan-Lin heuristic, which has nice computational properties and performs well in simulations. We demonstrate our procedure with applications to congressional voting records, and microarray analysis.

artificial intelligence, data mining, machine learning, (14 more...)

1206.6927

Country: North America > United States (1.00)

Genre: Research Report (0.63)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Heinze, Christina, McWilliams, Brian, Meinshausen, Nicolai

DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

We present Dual-Loco, a communicationefficient algorithm for distributed statistical estimation. Dual-Loco assumes that the data is distributed across workers according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependencies between features available to different workers. We show that Dual-Loco has bounded approximation error which only depends weakly on the number of workers. We compare Dual-Loco against a state-of-theart distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy. In particular, Dual-Loco allows for fast cross validation as only part of the algorithm depends on the regularization parameter.

artificial intelligence, machine learning, optimization problem, (17 more...)

1506.02554

Country: Europe (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Gniazdowski, Zenon, Grabowski, Michal

Numerical Coding of Nominal Data

In this paper, a novel approach for coding nominal data is proposed. For the given nominal data, a rank in a form of complex number is assigned. The proposed method does not lose any information about the attribute and brings other properties previously unknown. The approach based on these knew properties can been used for classification. The analyzed example shows that classification with the use of coded nominal data or both numerical as well as coded nominal data is more effective than the classification, which uses only numerical data.

artificial intelligence, machine learning, nominal data, (16 more...)

doi: 10.26348/znwwsi.12.53

1601.01966

Genre: Research Report (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Benavoli, Alessio, Zaffalon, Marco

State Space representation of non-stationary Gaussian Processes

arXiv.org Machine LearningJan-7-2016

The state space (SS) representation of Gaussian processes (GP) has recently gained a lot of interest. The main reason is that it allows to compute GPs based inferences in O(n), where $n$ is the number of observations. This implementation makes GPs suitable for Big Data. For this reason, it is important to provide a SS representation of the most important kernels used in machine learning. The aim of this paper is to show how to exploit the transient behaviour of SS models to map non-stationary kernels to SS models.

artificial intelligence, machine learning, ss model, (17 more...)

1601.01544

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)