AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Automatic Identification of Conceptual Metaphors With Limited Knowledge

Gandy, Lisa (Central Michigan University) | Allan, Nadji (Center for Advanced Defense Studies) | Atallah, Mark (Center for Advanced Defense Studies) | Frieder, Ophir (Georgetown University) | Howard, Newton (Massachusetts Institute of Technology) | Kanareykin, Sergey ( Brain Sciences Foundation ) | Koppel, Moshe (Bar-Ilan University) | Last, Mark (Ben Gurion University) | Neuman, Yair (Ben Gurion University) | Argamon, Shlomo (Illinois Institute of Technology)

AAAI ConferencesJul-9-2013

Full natural language understanding requires identifying and analyzing the meanings of metaphors, which are ubiquitous in both text and speech. Over the last thirty years, linguistic metaphors have been shown to be based on more general conceptual metaphors, partial semantic mappings between disparate conceptual domains. Though some achievements have been made in identifying linguistic metaphors over the last decade or so, little work has been done to date on automatically identifying conceptual metaphors. This paper describes research on identifying conceptual metaphors based on corpus data. Our method uses as little background knowledge as possible, to ease transfer to new languages and to mini- mize any bias introduced by the knowledge base construction process. The method relies on general heuristics for identifying linguistic metaphors and statistical clustering (guided by Wordnet) to form conceptual metaphor candidates. Human experiments show the system effectively finds meaningful conceptual metaphors.

machine learning, metaphor, natural language, (18 more...)

AAAI Conferences

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > District of Columbia > Washington (0.04)
Asia > Middle East > Jordan (0.04)
(4 more...)

Industry:

Government > Military (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Analogical Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Active Stratified Sampling with Clustering-Based Type Systems for Predicting the Search Tree Size of Problems with Real-Valued Heuristics

Lelis, Levi H. S. (University of Alberta)

AAAI ConferencesJul-5-2013

In this paper we advance the line of research launched by Knuth which was later improved by Chen for predicting the size of the search tree expanded by heuristic search algorithms such as IDA*. Chen's Stratified Sampling (SS) uses a partition of the nodes in the search tree called type system to guide its sampling. Recent work has shown that SS using type systems based on integer-valued heuristic functions can be quite effective. However, type systems based on real-valued heuristic functions are often too large to be practical. We use the k-means clustering algorithm for creating effective type systems for domains with real-valued heuristics. Orthogonal to the type systems, another contribution of this paper is the introduction of an algorithm called Active SS. SS allocates the same number of samples for each type. Active SS is the application of the idea of active sampling to search trees. Active SS allocates more samples to the types with higher uncertainty. Our empirical results show that (i) SS using clustering-based type systems tends to produce better predictions than competing schemes that do not use a type system, and that (ii) Active SS can produce better predictions than the regular version of SS.

active stratified sampling, clustering-based type system, real-valued heuristic, (1 more...)

AAAI Conferences

Sixth Annual Symposium on Combinatorial Search

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.53)

Add feedback

Parallelising the k-Medoids Clustering Problem Using Space-Partitioning

Arbelaez, Alejandro (JFLI / University of Tokyo) | Quesada, Luis (University College Cork)

AAAI ConferencesJul-5-2013

The k-medoids problem is a combinatorial optimisation problem with multiples applications in Resource Allocation, Mobile Computing, Sensor Networks and Telecommunications.Real instances of this problem involve hundreds of thousands of points and thousands of medoids.Despite the proliferation of parallel architectures, this problem has been mostly tackled using sequential approaches.In this paper, we study the impact of space-partitioning techniques on the performance of parallel local search algorithms to tackle the k-medoids clustering problem, and compare these results with the ones obtained using sampling.Our experiments suggest that approaches relying on partitioning scale more while preserving the quality of the solution.

k-medoid clustering problem, parallelising, space-partitioning

AAAI Conferences

Sixth Annual Symposium on Combinatorial Search

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.60)

Add feedback

Semi-supervised clustering methods

Bair, Eric

arXiv.org Machine LearningJun-30-2013

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.

artificial intelligence, constraint, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1002/wics.1270

1307.0252

Country: North America > United States (0.67)

Genre:

Workflow (0.68)
Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Constrained Optimization for a Subset of the Gaussian Parsimonious Clustering Models

Browne, Ryan P., Subedi, Sanjeena, McNicholas, Paul

arXiv.org Machine LearningJun-24-2013

The expectation-maximization (EM) algorithm is an iterative method for finding maximum likelihood estimates when data are incomplete or are treated as being incomplete. The EM algorithm and its variants are commonly used for parameter estimation in applications of mixture models for clustering and classification. This despite the fact that even the Gaussian mixture model likelihood surface contains many local maxima and is singularity riddled. Previous work has focused on circumventing this problem by constraining the smallest eigenvalue of the component covariance matrices. In this paper, we consider constraining the smallest eigenvalue, the largest eigenvalue, and both the smallest and largest within the family setting. Specifically, a subset of the GPCM family is considered for model-based clustering, where we use a re-parameterized version of the famous eigenvalue decomposition of the component covariance matrices. Our approach is illustrated using various experiments with simulated and real data.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1306.5824

Country:

Europe > Austria (0.28)
North America > Canada > Ontario (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Constrained fractional set programs and their application in local clustering and community detection

Bühler, Thomas, Rangapuram, Syama Sundar, Setzer, Simon, Hein, Matthias

arXiv.org Machine LearningJun-14-2013

The (constrained) minimization of a ratio of set functions is a problem frequently occurring in clustering and community detection. As these optimization problems are typically NP-hard, one uses convex or spectral relaxations in practice. While these relaxations can be solved globally optimally, they are often too loose and thus lead to results far away from the optimum. In this paper we show that every constrained minimization problem of a ratio of non-negative set functions allows a tight relaxation into an unconstrained continuous optimization problem. This result leads to a flexible framework for solving constrained problems in network analysis. While a globally optimal solution for the resulting non-convex problem cannot be guaranteed, we outperform the loose convex or spectral relaxations by a large margin on constrained local clustering problems.

artificial intelligence, constraint, machine learning, (17 more...)

arXiv.org Machine Learning

1306.3409

Country:

North America > United States (0.28)
Europe > Germany > Saarland (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Non-parametric Power-law Data Clustering

Fan, Xuhui, Zeng, Yiling, Cao, Longbing

arXiv.org Machine LearningJun-12-2013

It has always been a great challenge for clustering algorithms to automatically determine the cluster numbers according to the distribution of datasets. Several approaches have been proposed to address this issue, including the recent promising work which incorporate Bayesian Nonparametrics into the $k$-means clustering procedure. This approach shows simplicity in implementation and solidity in theory, while it also provides a feasible way to inference in large scale datasets. However, several problems remains unsolved in this pioneering work, including the power-law data applicability, mechanism to merge centers to avoid the over-fitting problem, clustering order problem, e.t.c.. To address these issues, the Pitman-Yor Process based k-means (namely \emph{pyp-means}) is proposed in this paper. Taking advantage of the Pitman-Yor Process, \emph{pyp-means} treats clusters differently by dynamically and adaptively changing the threshold to guarantee the generation of power-law clustering results. Also, one center agglomeration procedure is integrated into the implementation to be able to merge small but close clusters and then adaptively determine the cluster number. With more discussion on the clustering order, the convergence proof, complexity analysis and extension to spectral clustering, our approach is compared with traditional clustering algorithm and variational inference methods. The advantages and properties of pyp-means are validated by experiments on both synthetic datasets and real world datasets.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Machine Learning

1306.3003

Genre: Research Report > New Finding (0.46)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A Convergence Theorem for the Graph Shift-type Algorithms

Fan, Xuhui, Cao, Longbing

arXiv.org Machine LearningJun-12-2013

Graph Shift (GS) algorithms are recently focused as a promising approach for discovering dense subgraphs in noisy data. However, there are no theoretical foundations for proving the convergence of the GS Algorithm. In this paper, we propose a generic theoretical framework consisting of three key GS components: simplex of generated sequence set, monotonic and continuous objective function and closed mapping. We prove that GS algorithms with such components can be transformed to fit the Zangwill's convergence theorem, and the sequence set generated by the GS procedures always terminates at a local maximum, or at worst, contains a subsequence which converges to a local maximum of the similarity measure function. The framework is verified by expanding it to other GS-type algorithms and experimental results.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Machine Learning

1306.3002

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation

Azizyan, Martin, Singh, Aarti, Wasserman, Larry

arXiv.org Machine LearningJun-9-2013

While several papers have investigated computationally and statistically efficient methods for learning Gaussian mixtures, precise minimax bounds for their statistical performance as well as fundamental limits in high-dimensional settings are not well-understood. In this paper, we provide precise information theoretic bounds on the clustering accuracy and sample complexity of learning a mixture of two isotropic Gaussians in high dimensions under small mean separation. If there is a sparse subset of relevant dimensions that determine the mean separation, then the sample complexity only depends on the number of relevant dimensions and mean separation, and can be achieved by a simple computationally efficient procedure. Our results provide the first step of a theoretical basis for recent methods that combine feature selection and clustering.

artificial intelligence, machine learning, probability, (18 more...)

arXiv.org Machine Learning

1306.2035

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.61)

Add feedback

Reducing statistical time-series problems to binary classification

Ryabko, Daniil, Mary, Jérémie

arXiv.org Machine LearningJun-7-2013

We show how binary classification methods developed to work on i.i.d. data can be used for solving statistical problems that are seemingly unrelated to classification and concern highly-dependent time series. Specifically, the problems of time-series clustering, homogeneity testing and the three-sample problem are addressed. The algorithms that we construct for solving these problems are based on a new metric between time-series distributions, which can be evaluated using binary classification methods. Universal consistency of the proposed algorithms is proven under most general assumptions. The theoretical results are illustrated with experiments on synthetic and real-world data.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1210.6001

Country:

Asia (0.28)
Europe > France (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback