AITopics

1207.4155

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Shortreed, Susan, Meila, Marina

Unsupervised spectral learning

arXiv.org Machine LearningJul-4-2012

In spectral clustering and spectral image segmentation, the data is partioned starting from a given matrix of pairwise similarities S. the matrix S is constructed by hand, or learned on a separate training set. In this paper we show how to achieve spectral clustering in unsupervised mode. Our algorithm starts with a set of observed pairwise features, which are possible components of an unknown, parametric similarity function. This function is learned iteratively, at the same time as the clustering of the data. The algorithm shows promosing results on synthetic and real data.

algorithm, artificial intelligence, machine learning, (18 more...)

1207.1358

Country: North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.31)

Blanchard, Frédéric, Herbin, Michel

Relational Data Mining Through Extraction of Representative Exemplars

arXiv.org Machine LearningJul-3-2012

With the growing interest on Network Analysis, Relational Data Mining is becoming an emphasized domain of Data Mining. This paper addresses the problem of extracting representative elements from a relational dataset. After defining the notion of degree of representativeness, computed using the Borda aggregation procedure, we present the extraction of exemplars which are the representative elements of the dataset. We use these concepts to build a network on the dataset. We expose the main properties of these notions and we propose two typical applications of our framework. The first application consists in resuming and structuring a set of binary images and the second in mining co-authoring relation in a research team.

data mining, exemplar, machine learning, (17 more...)

1207.0833

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Beal, Matthew, Krishnamurthy, Praveen

Gene Expression Time Course Clustering with Countably Infinite Hidden Markov Models

It is said that genes that cluster with similar expression-- that is, are co-expressed--serve similar functional roles in a process (see, for example, Eisen et al. 1998). Bioin-formaticians have more recently had access to sets of time-series measurements of genes' expression over the duration of an experiment, and have desired therefore to learn not just co-expression, but causal relationships that may help elucidate co-regulation as well. Two problematic issues hamper practical methods for clustering gene expression time course data: first, if deriving a model-based clustering metric, it is often unclear what the appropriate model complexity should be; second, the current clustering algorithms available cannot handle, and therefore disregard, the temporal information. This usually occurs when constructing a metric for the distance between any two such genes. The common practice for an experiment having T measurements of a gene's expression over time is to consider the expression as positioned in a T -dimensional space, and to perform (at worse spherical metric) clustering in that space. The result is that the clustering algorithm is invariant to arbitrary permutations of the time points, which is highly undesirable since we would like to take into account the correlations between all the genes' expression at nearby or adjacent time points.

artificial intelligence, machine learning, mixture model, (14 more...)

1206.6824

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

An Iterative Locally Linear Embedding Algorithm

Kong, Deguang, Ding, Chris H. Q., Huang, Heng, Nie, Feiping

Local Linear embedding (LLE) is a popular dimension reduction method. In this paper, we first show LLE with nonnegative constraint is equivalent to the widely used Laplacian embedding. We further propose to iterate the two steps in LLE repeatedly to improve the results. Thirdly, we relax the kNN constraint of LLE and present a sparse similarity learning algorithm. The final Iterative LLE combines these three improvements. Extensive experiment results show that iterative LLE algorithm significantly improve both classification and clustering results.

algorithm, artificial intelligence, machine learning, (16 more...)

1206.6463

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Varoquaux, Gael, Gramfort, Alexandre, Thirion, Bertrand

Small-sample Brain Mapping: Sparse Recovery on Spatially Correlated Designs with Randomization and Clustering

Functional neuroimaging can measure the brain?s response to an external stimulus. It is used to perform brain mapping: identifying from these observations the brain regions involved. This problem can be cast into a linear supervised learning task where the neuroimaging data are used as predictors for the stimulus. Brain mapping is then seen as a support recovery problem. On functional MRI (fMRI) data, this problem is particularly challenging as i) the number of samples is small due to limited acquisition time and ii) the variables are strongly correlated. We propose to overcome these difficulties using sparse regression models over new variables obtained by clustering of the original variables. The use of randomization techniques, e.g. bootstrap samples, and clustering of the variables improves the recovery properties of sparse methods. We demonstrate the benefit of our approach on an extensive simulation study as well as two fMRI datasets.

artificial intelligence, machine learning, recovery, (17 more...)

1206.6447

Country: Europe (0.28)

Genre: Research Report (0.85)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Telgarsky, Matus, Dasgupta, Sanjoy

Agglomerative Bregman Clustering

This manuscript develops the theory of agglomerative clustering with Bregman divergences. Geometric smoothing techniques are developed to deal with degenerate clusters. To allow for cluster models based on exponential families with overcomplete representations, Bregman divergences are developed for nondifferentiable convex functions.

artificial intelligence, bregman divergence, machine learning, (16 more...)

1206.6446

Country: North America > United States (0.68)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Rey, Melanie, Roth, Volker

Copula Mixture Model for Dependency-seeking Clustering

We introduce a copula mixture model to perform dependency-seeking clustering when co-occurring samples from different data sources are available. The model takes advantage of the great flexibility offered by the copulas framework to extend mixtures of Canonical Correlation Analysis to multivariate data with arbitrary continuous marginal densities. We formulate our model as a non-parametric Bayesian mixture, while providing efficient MCMC inference. Experiments on synthetic and real data demonstrate that the increased flexibility of the copula mixture significantly improves the clustering and the interpretability of the results.

artificial intelligence, dependency, machine learning, (15 more...)

1206.6433

Country:

Europe (0.46)
North America (0.46)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Krishnamurthy, Akshay, Balakrishnan, Sivaraman, Xu, Min, Singh, Aarti

Efficient Active Algorithms for Hierarchical Clustering

arXiv.org Machine LearningJun-18-2012

Advances in sensing technologies and the growth of the internet have resulted in an explosion in the size of modern datasets, while storage and processing power continue to lag behind. This motivates the need for algorithms that are efficient, both in terms of the number of measurements needed and running time. To combat the challenges associated with large datasets, we propose a general framework for active hierarchical clustering that repeatedly runs an off-the-shelf clustering algorithm on small subsets of the data and comes with guarantees on performance, measurement complexity and runtime complexity. We instantiate this framework with a simple spectral clustering algorithm and provide concrete results on its performance, showing that, under some assumptions, this algorithm recovers all clusters of size ?(log n) using O(n log^2 n) similarities and runs in O(n log^3 n) time for a dataset of n objects. Through extensive experimentation we also demonstrate that this framework is practically alluring.

algorithm, artificial intelligence, machine learning, (15 more...)

1206.4672

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningJun-18-2012

Groupwise Constrained Reconstruction for Subspace Clustering

Li, Ruijiang, Li, Bin, Zhang, Ke, Jin, Cheng, Xue, Xiangyang

Reconstruction based subspace clustering methods compute a self reconstruction matrix over the samples and use it for spectral clustering to obtain the final clustering result. Their success largely relies on the assumption that the underlying subspaces are independent, which, however, does not always hold in the applications with increasing number of subspaces. In this paper, we propose a novel reconstruction based subspace clustering model without making the subspace independence assumption. In our model, certain properties of the reconstruction matrix are explicitly characterized using the latent cluster indicators, and the affinity matrix used for spectral clustering can be directly built from the posterior of the latent cluster indicators instead of the reconstruction matrix. Experimental results on both synthetic and real-world datasets show that the proposed model can outperform the state-of-the-art methods.

artificial intelligence, machine learning, subspace, (14 more...)

1206.4644

Country: Europe > United Kingdom (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)