AITopics

1309.2303

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Data Science > Data Mining (0.88)

Subedi, Sanjeena, McNicholas, Paul D.

Variational Bayes Approximations for Clustering via Mixtures of Normal Inverse Gaussian Distributions

arXiv.org Machine LearningSep-7-2013

The use of mixture models for clustering, referred to as model-based clustering, has become increasingly popular since the work of Wolfe (1963). A wide variety of finite mixture models has been studied extensively within the literature to date. Amongst these, the Gaussian mixture model has received special attention due to its mathematical tractability and the relative computational simplicity associated with parameter estimation. However, the Gaussian mixture model is not without limitations; for instance, the component densities are restricted to being symmetric.

artificial intelligence, bayesian inference, machine learning, (17 more...)

doi: 10.1007/s11634-014-0165-7

1309.1901

Country:

North America > Canada > Ontario (0.46)
North America > United States > California (0.46)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Dahlin, Johan, Svenson, Pontus

Ensemble approaches for improving community detection methods

arXiv.org Machine LearningSep-1-2013

Statistical estimates can often be improved by fusion of data from several different sources. One example is so-called ensemble methods which have been successfully applied in areas such as machine learning for classification and clustering. In this paper, we present an ensemble method to improve community detection by aggregating the information found in an ensemble of community structures. This ensemble can found by re-sampling methods, multiple runs of a stochastic community detection method, or by several different community detection algorithms applied to the same network. The proposed method is evaluated using random networks with community structures and compared with two commonly used community detection methods. The proposed method when applied on a stochastic community detection algorithm performs well with low computational complexity, thus offering both a new approach to community detection and an additional community detection method.

community structure, data mining, machine learning, (16 more...)

1309.0242

Country:

Europe > United Kingdom > England (0.46)
North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Miller, Jeffrey W., Harrison, Matthew T.

Inconsistency of Pitman-Yor process mixtures for the number of components

arXiv.org Machine LearningAug-30-2013

In population genetics, determining the "population structure" is an important step in the analysis of sampled data. As an illustrative example, consider the impala, a species of antelope in southern Africa. Impalas are divided into two subspecies: the common impala occupying much of the eastern half of the region, and the black-faced impala inhabiting a small area in the west. While common impalas are abundant, the number of black-faced impalas has been decimated by drought, poaching, and declining resources due to human and livestock expansion. To assist conservation efforts, Lorenzen, Arctander and Siegismund (2006) collected samples from 216 impalas, and analyzed the genetic variation between/within the two subspecies. A key part of their analysis consisted of inferring the population structure -- that is, partitioning the data into distinct populations, and in particular, determining how many such populations there are. To infer the impala population structure, Lorenzen et al. employed a widely-used tool called Structure (Pritchard, Stephens and Donnelly, 2000) which, in the simplest version, models the data as a finite mixture, with each component in the mixture corresponding to a dis-Supported in part by NSF grant DMS-1007593 and DARPA contract FA8650-11-1-715.

artificial intelligence, exponential family, machine learning, (17 more...)

1309.0024

Country:

North America > United States (0.66)
Africa (0.54)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Murray, Paula M., McNicholas, Paul D., Browne, Ryan P.

Mixtures of Common Skew-t Factor Analyzers

arXiv.org Machine LearningAug-30-2013

A mixture of common skew-t factor analyzers model is introduced for model-based clustering of high-dimensional data. By assuming common component factor loadings, this model allows clustering to be performed in the presence of a large number of mixture components or when the number of dimensions is too large to be well-modelled by the mixtures of factor analyzers model or a variant thereof. Furthermore, assuming that the component densities follow a skew-t distribution allows robust clustering of skewed data. The alternating expectation-conditional maximization algorithm is employed for parameter estimation. We demonstrate excellent clustering performance when our model is applied to real and simulated data.This paper marks the first time that skewed common factors have been used.

artificial intelligence, machine learning, mcstfa model, (18 more...)

doi: 10.1002/sta4.43

1307.5558

Country: North America > Canada > Ontario (0.46)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Brunet, Camille, Loustau, Sébastien

The algorithm of noisy k-means

arXiv.org Machine LearningAug-15-2013

In this note, we introduce a new algorithm to deal with finite dimensional clustering with errors in variables. The design of this algorithm is based on recent theoretical advances (see Loustau (2013a,b)) in statistical learning with errors in variables. As the previous mentioned papers, the algorithm mixes different tools from the inverse problem literature and the machine learning community. Coarsely, it is based on a two-step procedure: (1) a deconvolution step to deal with noisy inputs and (2) Newton's iterations as the popular k-means.

algorithm, artificial intelligence, machine learning, (17 more...)

1308.3314

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Multi-View K-Means Clustering on Big Data

Cai, Xiao (University of Texas at Arlington) | Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington)

In past decade, more and more data are collected from multiple sources or represented by multiple views, where different views describe distinct perspectives of the data. Although each view could be individually used for finding patterns by clustering, the clustering performance could be more accurate by exploring the rich information among multiple views. Several multi-view clustering methods have been proposed to unsupervised integrate different views of data. However, they are graph based approaches, e.g. based on spectral clustering, such that they cannot handle the large-scale data. How to combine these heterogeneous features for unsupervised large-scale data clustering has become a challenging problem. In this paper, we propose a new robust large-scale multi-view clustering method to integrate heterogeneous representations of large-scale data. We evaluate the proposed new methods by six benchmark data sets and compared the performance with several commonly used clustering approaches as well as the baseline multi-view clustering methods. In all experimental results, our proposed methods consistently achieve superiors clustering performances.

big data, multi-view k-means clustering

Twenty-Third International Joint Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Liu, Jialu (University of Illinois at Urbana-Champaign) | Wang, Chi (University of Illinois at Urbana-Champaign) | Danilevsky, Marina (University of Illinois at Urbana-Champaign) | Han, Jiawei (University of Illinois at Urbana-Champaign)

Large-Scale Spectral Clustering on Graphs

Graph clustering has received growing attention in recent years as an important analytical technique, both due to the prevalence of graph data, and the usefulness of graph structures for exploiting intrinsic data characteristics.However, as graph data grows in scale, it becomes increasingly more challenging to identify clusters. In this paper we propose an efficient clustering algorithm for large-scale graph data using spectral methods. The key idea is to repeatedly generate a small number of "supernodes" connected to the regular nodes, in order to compress the original graph into a sparse bipartite graph. By clustering the bipartite graph using spectral methods, we are able to greatly improve efficiency without losing considerable clustering power. Extensive experiments show the effectiveness and efficiency of our approach.

graph, large-scale spectral clustering

Twenty-Third International Joint Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.53)

Learning Finite Beta-Liouville Mixture Models via Variational Bayes for Proportional Data Clustering

Fan, Wentao (Concordia University) | Bouguila, Nizar (Concordia University)

learning finite beta-liouville mixture model, proportional data clustering, variational bayes

During the past decade, finite mixture modeling has become a well-established technique in data analysis and clustering. This paper focus on developing a variational inference framework to learn finite Beta-Liouville mixture models that have been proposed recently as an efficient way for proportional data clustering. In contrast to the conventional expectation maximization (EM) algorithm, commonly used for learning finite mixture models, the proposed algorithm has the advantages that it is more efficient from a computational point of view and by preventing over- and under-fitting problems. Moreover, the complexity of the mixture model (i.e. the number of components) can be determined automatically and simultaneously with the parameters estimation in a closed form as part of the Bayesian inference procedure. The merits of the proposed approach are shown using both artificial data sets and two interesting and challenging real applications namely dynamic textures clustering and facial expression recognition.

Twenty-Third International Joint Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.60)

Algorithm Portfolios Based on Cost-Sensitive Hierarchical Clustering

Malitsky, Yuri (Cork Constraint Computation Centre) | Sabharwal, Ashish (IBM Watson Research Center) | Samulowitz, Horst (IBM Watson Research Center) | Sellmann, Meinolf (IBM Watson Research Center)

algorithm portfolio, cost-sensitive hierarchical clustering

Different solution approaches for combinatorial problems often exhibit incomparable performance that depends on the concrete problem instanceto be solved. Algorithm portfolios aim to combine the strengths of multiple algorithmic approaches by training a classifier that selects or schedules solvers dependent on the given instance. We devise a new classifier that selects solvers based on a cost-sensitive hierarchical clustering model. Experimental results on SAT and MaxSAT show that the new method outperforms the most effective portfolio builders to date.

Twenty-Third International Joint Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.60)