AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Community Detection in Networks: The Leader-Follower Algorithm

Shah, Devavrat, Zaman, Tauhid

arXiv.org Machine LearningNov-2-2010

Traditional spectral clustering methods cannot naturally learn the number of communities in a network and often fail to detect smaller community structure in dense networks because they are based upon external community connectivity properties such as graph cuts. We propose an algorithm for detecting community structure in networks called the leader-follower algorithm which is based upon the natural internal structure expected of communities in social networks. The algorithm uses the notion of network centrality in a novel manner to differentiate leaders (nodes which connect different communities) from loyal followers (nodes which only have neighbors within a single community). Using this approach, it is able to naturally learn the communities from the network structure and does not require the number of communities as an input, in contrast to other common methods such as spectral clustering. We prove that it will detect all of the communities exactly for any network possessing communities with the natural internal structure expected in social networks. More importantly, we demonstrate the effectiveness of the leader-follower algorithm in the context of various real networks ranging from social networks such as Facebook to biological networks such as an fMRI based human brain network. We find that the leader-follower algorithm finds the relevant community structure in these networks without knowing the number of communities beforehand. Also, because the leader-follower algorithm detects communities using their internal structure, we find that it can resolve a finer community structure in dense networks than common spectral clustering methods based on external community structure.

artificial intelligence, machine learning, social media, (18 more...)

arXiv.org Machine Learning

1011.0774

Genre: Research Report (0.50)

Industry:

Information Technology > Services (0.96)
Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Significance of Classification Techniques in Prediction of Learning Disabilities

Balakrishnan, Julie M. David And Kannan

arXiv.org Artificial IntelligenceNov-2-2010

The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified.

artificial intelligence, expert system, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.5121/ijaia.2010.1409

1011.0628

Country: Asia > India (0.49)

Genre: Research Report > New Finding (0.46)

Industry: Education > Focused Education > Special Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.93)
(2 more...)

Add feedback

Infinite Hierarchical MMSB Model for Nested Communities/Groups in Social Networks

Ho, Qirong, Parikh, Ankur P., Song, Le, Xing, Eric P.

arXiv.org Machine LearningOct-9-2010

Actors in realistic social networks play not one but a number of diverse roles depending on whom they interact with, and a large number of such role-specific interactions collectively determine social communities and their organizations. Methods for analyzing social networks should capture these multi-faceted role-specific interactions, and, more interestingly, discover the latent organization or hierarchy of social communities. We propose a hierarchical Mixed Membership Stochastic Blockmodel to model the generation of hierarchies in social communities, selective membership of actors to subsets of these communities, and the resultant networks due to within- and cross-community interactions. Furthermore, to automatically discover these latent structures from social networks, we develop a Gibbs sampling algorithm for our model. We conduct extensive validation of our model using synthetic networks, and demonstrate the utility of our model in real-world datasets such as predator-prey networks and citation networks.

artificial intelligence, machine learning, social media, (19 more...)

arXiv.org Machine Learning

1010.1868

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)

Genre: Research Report (0.40)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

Mahoney, Michael W.

arXiv.org Machine LearningOct-8-2010

In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.

bioinformatics, data mining, machine learning, (21 more...)

arXiv.org Machine Learning

1010.1609

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Information Management > Search (0.93)
(4 more...)

Add feedback

Mixed-Membership Stochastic Block-Models for Transactional Networks

Shafiei, Mahdi, Chipman, Hugh

arXiv.org Artificial IntelligenceOct-7-2010

Transactional network data can be thought of as a list of one-to-many communications(e.g., email) between nodes in a social network. Most social network models convert this type of data into binary relations between pairs of nodes. We develop a latent mixed membership model capable of modeling richer forms of transactional network data, including relations between more than two nodes. The model can cluster nodes and predict transactions. The block-model nature of the model implies that groups can be characterized in very general ways. This flexible notion of group structure enables discovery of rich structure in transactional networks. Estimation and inference are accomplished via a variational EM algorithm. Simulations indicate that the learning algorithm can recover the correct generative model. Interesting structure is discovered in the Enron email dataset and another dataset extracted from the Reddit website. Analysis of the Reddit data is facilitated by a novel performance measure for comparing two soft clusterings. The new model is superior at discovering mixed membership in groups and in predicting transactions.

artificial intelligence, machine learning, node, (20 more...)

arXiv.org Artificial Intelligence

1010.1437

Country: North America > United States > California (0.28)

Genre: Research Report (0.40)

Industry: Information Technology (0.96)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

Optimizing an Organized Modularity Measure for Topographic Graph Clustering: a Deterministic Annealing Approach

Rossi, Fabrice, Villa-Vialaneix, Nathalie

arXiv.org Machine LearningSep-7-2010

This paper proposes an organized generalization of Newman and Girvan's modularity measure for graph clustering. Optimized via a deterministic annealing scheme, this measure produces topologically ordered graph clusterings that lead to faithful and readable graph representations based on clustering induced graphs. Topographic graph clustering provides an alternative to more classical solutions in which a standard graph clustering method is applied to build a simpler graph that is then represented with a graph layout algorithm. A comparative study on four real world graphs ranging from 34 to 1 133 vertices shows the interest of the proposed approach with respect to classical solutions and to self-organizing maps for graphs.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1016/j.neucom.2009.11.023

1009.1318

Country: Europe > Austria (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering

Seldin, Yevgeny

arXiv.org Machine LearningSep-2-2010

We formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. This formulation enables practical and theoretical comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. We adapt the PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009) to derive a PAC-Bayesian generalization bound for graph clustering. The bound shows that graph clustering should optimize a trade-off between empirical data fit and the mutual information that clusters preserve on the graph nodes. A similar trade-off derived from information-theoretic considerations was already shown to produce state-of-the-art results in practice (Slonim et al., 2005; Yom-Tov and Slonim, 2009). This paper supports the empirical evidence by providing a better theoretical foundation, suggesting formal generalization guarantees, and offering a more accurate way to deal with finite sample issues. We derive a bound minimization algorithm and show that it provides good results in real-life problems and that the derived PAC-Bayesian bound is reasonably tight.

artificial intelligence, graph, machine learning, (16 more...)

arXiv.org Machine Learning

1009.0499

Country: Europe > Germany (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A Phrase-Based Method for Hierarchical Clustering of Web Snippets

Li, Zhao (University of Vermont) | Wu, Xindong (University of Vermont)

AAAI ConferencesJul-15-2010

Document clustering has been applied in web information retrieval, which facilitates users’ quick browsing by organizing retrieved results into different groups. Meanwhile, a tree-like hierarchical structure is wellsuited for organizing the retrieved results in favor of web users. In this regard, we introduce a new method for hierarchical clustering of web snippets by exploiting a phrase-based document index. In our method, a hierarchy of web snippets is built based on phrases instead of all snippets, and the snippets are then assigned to the corresponding clusters consisting of phrases. We show that, as opposed to the traditional hierarchical clustering, our method not only presents meaningful cluster labels but also improves clustering performance.

artificial intelligence, machine learning, snippet, (17 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Vermont (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Learning from Concept Drifting Data Streams with Unlabeled Data

Li, Peipei (Hefei University of Technology) | Wu, Xindong (University of Vermont) | Hu, Xuegang (Hefei University of Technology)

AAAI ConferencesJul-15-2010

Contrary to the previous beliefs that all arrived streaming data are labeled and the class labels are immediately availa- ble, we propose a Semi-supervised classification algorithm for data streams with concept drifts and UNlabeled data, called SUN. SUN is based on an evolved decision tree. In terms of deviation between history concept clusters and new ones generated by a developed clustering algorithm of k-Modes, concept drifts are distinguished from noise at leaves. Extensive studies on both synthetic and real data demonstrate that SUN performs well compared to several known online algorithms on unlabeled data. A conclusion is hence drawn that a feasible reference framework is provided for tackling concept drifting data streams with unlabeled data.

algorithm, artificial intelligence, machine learning, (13 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Vermont > Chittenden County > Burlington (0.15)
Asia > China > Anhui Province > Hefei (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.37)

Add feedback

Interactive Categorization of Containers and Non-Containers by Unifying Categorizations Derived from Multiple Exploratory Behaviors

Griffith, Shane (Iowa State University) | Stoytchev, Alexander (Iowa State University)

AAAI ConferencesJul-15-2010

The ability to form object categories is an important milestone in human infant development (Cohen 2003). We propose a framework that allows a robot to form a unified object categorization from several interactions with objects. This framework is consistent with the principle that robot a) Drop Block b) Grasp c) Move learning should be ultimately grounded in the robot's perceptual and behavioral repertoire (Stoytchev 2009). This paper builds upon our previous work (Griffith et al. 2009) by adding more exploratory behaviors (now 6 instead of 1) and by employing consensus clustering for finding a single, unified object categorization. The framework was tested on a container/non-container categorization task with 20 objects.

artificial intelligence, categorization, machine learning, (13 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York (0.05)
North America > United States > Iowa (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.32)

Add feedback