AITopics

Fusing multiple information sources can yield significant benefits to successfully accomplish learning tasks. Many studies have focussed on fusing information in supervised learning contexts. We present an approach to utilize multiple information sources in the form of similarity data for unsupervised learning. Based on similarity information, the clustering task is phrased as a nonnegative matrix factorization problem of a mixture of similarity measurements. The tradeoff between the informativeness of data sources and the sparseness of their mixture is controlled by an entropy-based weighting mechanism. For the purpose of model selection, a stability-based approach is employed to ensure the selection of the most self-consistent hypothesis. The experiments demonstrate the performance of the method on toy as well as real world data sets.

factorization, information source, similarity, (14 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Jordan (0.05)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Krupka, Eyal, Tishby, Naftali

Generalization in Clustering with Unobserved Features

We argue that when objects are characterized by many attributes, clustering them on the basis of a relatively small random subset of these attributes can capture information on the unobserved attributes as well. Moreover, we show that under mild technical conditions, clustering the objects on the basis of such a random subset performs almost as well as clustering with the full attribute set. We prove a finite sample generalization theorems for this novel learning scheme that extends analogous results from the supervised learning setting. The scheme is demonstrated for collaborative filtering of users with movies rating as attributes.

algorithm, information, unobserved feature, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)

A Probabilistic Approach for Optimizing Spectral Clustering

Jin, Rong, Kang, Feng, Ding, Chris H.

Spectral clustering enjoys its success in both data clustering and semisupervised learning. But, most spectral clustering algorithms cannot handle multi-class clustering problems directly. Additional strategies are needed to extend spectral clustering algorithms to multi-class clustering problems. Furthermore, most spectral clustering algorithms employ hard cluster membership, which is likely to be trapped by the local optimum. In this paper, we present a new spectral clustering algorithm, named "Soft Cut". It improves the normalized cut algorithm by introducing soft membership, and can be efficiently computed using a bound optimization algorithm. Our experiments with a variety of datasets have shown the promising performance of the proposed clustering algorithm.

algorithm, cut algorithm, normalized cut algorithm, (16 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > Michigan > Ingham County > Lansing (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Frey, Brendan J., Dueck, Delbert

Mixture Modeling by Affinity Propagation

Clustering is a fundamental problem in machine learning and has been approached in many ways. Two general and quite different approaches include iteratively fitting a mixture model (e.g., using EM) and linking together pairs of training cases that have high affinity (e.g., using spectral methods). Pairwise clustering algorithms need not compute sufficient statistics and avoid poor solutions by directly placing similar examples in the same cluster. However, many applications require that each cluster of data be accurately described by a prototype or model, so affinity-based clustering - and its benefits - cannot be directly realized. We describe a technique called "affinity propagation", which combines the advantages of both approaches. The method learns a mixture model of the data by recursively propagating affinity messages. We demonstrate affinity propagation on the problems of clustering image patches for image segmentation and learning mixtures of gene expression models from microarray data. We find that affinity propagation obtains better solutions than mixtures of Gaussians, the K-medoids algorithm, spectral clustering and hierarchical clustering, and is both able to find a pre-specified number of clusters and is able to automatically determine the number of clusters. Interestingly, affinity propagation can be viewed as belief propagation in a graphical model that accounts for pairwise training case likelihood functions and the identification of cluster centers.

affinity propagation, algorithm, exemplar, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Barber, David, Agakov, Felix V.

Kernelized Infomax Clustering

We propose a simple information-theoretic approach to soft clustering based on maximizing the mutual information I(x, y) between the unknown cluster labels y and the training patterns x with respect to parameters of specifically constrained encoding distributions. The constraints are chosen such that patterns are likely to be clustered similarly if they lie close to specific unknown vectors in the feature space. The method may be conveniently applied to learning the optimal affinity matrix, which corresponds to learning parameters of the kernelized encoder. The procedure does not require computations of eigenvalues of the Gram matrices, which makes it potentially attractive for clustering large data sets.

cluster allocation, clustering, gaussian mixture, (16 more...)

Country:

Asia > Middle East > Jordan (0.05)
Europe > United Kingdom (0.04)
Europe > Switzerland (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Size Regularized Cut for Data Clustering

Chen, Yixin, Zhang, Ya, Ji, Xiang

We present a novel spectral clustering method that enables users to incorporate priorknowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is defined as the sum of the inter-cluster similarity and a regularization term measuring therelative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NPcomplete. An approximation algorithm isproposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate thatthe method is not sensitive to outliers and performs better than normalized cut.

cost function, partition, srcut, (14 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Kansas (0.04)
(2 more...)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Soft Clustering on Graphs

Yu, Kai, Yu, Shipeng, Tresp, Volker

Finally we provide very encouraging experimental results.

artificial intelligence, data mining, machine learning, (20 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.99)
Information Technology > Data Science > Data Mining (0.95)

Lange, Tilman, Buhmann, Joachim M.

Fusion of Similarity Data in Clustering

Fusing multiple information sources can yield significant benefits to successfully accomplishlearning tasks. Many studies have focussed on fusing information in supervised learning contexts. We present an approach to utilize multiple information sources in the form of similarity data for unsupervised learning. Based on similarity information, the clustering task is phrased as a nonnegative matrix factorization problem of a mixture ofsimilarity measurements. The tradeoff between the informativeness ofdata sources and the sparseness of their mixture is controlled by an entropy-based weighting mechanism. For the purpose of model selection, astability-based approach is employed to ensure the selection of the most self-consistent hypothesis. The experiments demonstrate the performance of the method on toy as well as real world data sets.

artificial intelligence, machine learning, similarity, (16 more...)

Country: Europe > Switzerland (0.28)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Krupka, Eyal, Tishby, Naftali

Generalization in Clustering with Unobserved Features

We argue that when objects are characterized by many attributes, clustering themon the basis of a relatively small random subset of these attributes can capture information on the unobserved attributes as well. Moreover, we show that under mild technical conditions, clustering the objects on the basis of such a random subset performs almost as well as clustering with the full attribute set. We prove a finite sample generalization theoremsfor this novel learning scheme that extends analogous results from the supervised learning setting. The scheme is demonstrated for collaborative filtering of users with movies rating as attributes.

artificial intelligence, information, machine learning, (15 more...)

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)

A Probabilistic Approach for Optimizing Spectral Clustering

Jin, Rong, Kang, Feng, Ding, Chris H.

Spectral clustering enjoys its success in both data clustering and semisupervised learning.But, most spectral clustering algorithms cannot handle multi-class clustering problems directly. Additional strategies are needed to extend spectral clustering algorithms to multi-class clustering problems.Furthermore, most spectral clustering algorithms employ hard cluster membership, which is likely to be trapped by the local optimum. Inthis paper, we present a new spectral clustering algorithm, named "Soft Cut". It improves the normalized cut algorithm by introducing softmembership, and can be efficiently computed using a bound optimization algorithm. Our experiments with a variety of datasets have shown the promising performance of the proposed clustering algorithm.

algorithm, artificial intelligence, machine learning, (18 more...)

Country:

North America > United States > Michigan (0.28)
North America > United States > California (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)