AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Learning From Hidden Traits: Joint Factor Analysis and Latent Clustering

Yang, Bo, Fu, Xiao, Sidiropoulos, Nicholas D.

arXiv.org Machine LearningMay-21-2016

Dimensionality reduction techniques play an essential role in data analytics, signal processing and machine learning. Dimensionality reduction is usually performed in a preprocessing stage that is separate from subsequent data analysis, such as clustering or classification. Finding reduced-dimension representations that are well-suited for the intended task is more appealing. This paper proposes a joint factor analysis and latent clustering framework, which aims at learning cluster-aware low-dimensional representations of matrix and tensor data. The proposed approach leverages matrix and tensor factorization models that produce essentially unique latent representations of the data to unravel latent cluster structure -- which is otherwise obscured because of the freedom to apply an oblique transformation in latent space. At the same time, latent cluster structure is used as prior information to enhance the performance of factorization. Specific contributions include several custom-built problem formulations, corresponding algorithms, and discussion of associated convergence properties. Besides extensive simulations, real-world datasets such as Reuters document data and MNIST image data are also employed to showcase the effectiveness of the proposed approaches.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2016.2614491

1605.06711

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Industry:

Law (0.97)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

Clustering Algorithms: From Start To State Of The Art

#artificialintelligenceMay-19-2016, 23:27:08 GMT

It's not a bad time to be a Data Scientist. Serious people may find interest in you if you turn the conversation towards "Big Data", and the rest of the party crowd will be intrigued when you mention "Artificial Intelligence" and "Machine Learning". Even Google thinks you're not bad, and that you're getting even better. There are a lot of'smart' algorithms that help data scientists do their wizardry. It may all seem complicated, but if we understand and organize algorithms a bit, it's not even that hard to find and apply the one that we need. Courses on data mining or machine learning will usually start with clustering, because it is both simple and useful. It is an important part of a somewhat wider area of Unsupervised Learning, where the data we want to describe is not labeled.

artificial intelligence, exemplar, machine learning, (15 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

Geometry of Interest (GOI): Spatio-Temporal Destination Extraction and Partitioning in GPS Trajectory Data

Mousavi, Seyed Morteza, Harwood, Aaron, Karunasekera, Shanika, Maghrebi, Mojtaba

arXiv.org Artificial IntelligenceMay-16-2016

Noname manuscript No. (will be inserted by the editor) Abstract Nowadays large amounts of GPS trajectory data is being continuously collected by GPSenabled devices such as vehicles navigation systems and mobile phones. GPS trajectory data is useful for applications such as traffic management, location forecasting, and itinerary planning. Such applications often need to extract the time-stamped Sequence of Visited Locations (SVLs) of the mobile objects. The nearest neighbor query (NNQ) is the most applied method for labeling the visited locations based on the IDs of the POIs in the process of SVL generation. NNQ in some scenarios is not accurate enough. To improve the quality of the extracted SVLs, instead of using NNQ, we label the visited locations as the IDs of the POIs which geometrically intersect with the GPS observations. In this paper we propose a novel method for estimating the POIs and their GOIs, which consists of three phases: (i) extracting the geometries of the stay regions; (ii) constructing the geometry of destination regions based on the extracted stay regions; and (iii) constructing the GOIs based on the geometries of the destination regions. Using the geometric similarity to known GOIs as the major evaluation criterion, the experiments we performed using long-term GPS trajectory data show that our method outperforms the existing approaches. Keywords Trajectory Data, Spatio-Temporal Partitioning, Geometry of Interest, Time-Value, Time-Weighted Centroid, Destination Extraction 1 Introduction In recent years, GPS trajectory data has become abundant due to the many GPS enabled devices used on a daily basis. Mining these GPS trajectories for gathering useful information for applications has received a growing amount of attention in the recent literature. In this field, researchers have tried to derive knowledge for solving practical problems (e.g. The applications dealing with data analysis on trajectory data often need to have access to information about the significant places which a mobile object frequently travels and stay. These significant places are referred to as the points of interest (POIs).

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1603.0411

Country: North America > United States > New York (0.14)

Genre: Research Report (0.84)

Industry: Transportation > Ground > Road (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Information Management (0.93)
(2 more...)

Add feedback

Wisdom of Crowds cluster ensemble

Alizadeh, Hosein, Yousefnezhad, Muhammad, Bidgoli, Behrouz Minaei

arXiv.org Machine LearningMay-13-2016

The Wisdom of Crowds is a phenomenon described in social science that suggests four criteria applicable to groups of people. It is claimed that, if these criteria are satisfied, then the aggregate decisions made by a group will often be better than those of its individual members. Inspired by this concept, we present a novel feedback framework for the cluster ensemble problem, which we call Wisdom of Crowds Cluster Ensemble (WOCCE). Although many conventional cluster ensemble methods focusing on diversity have recently been proposed, WOCCE analyzes the conditions necessary for a crowd to exhibit this collective wisdom. These include decentralization criteria for generating primary results, independence criteria for the base algorithms, and diversity criteria for the ensemble members. We suggest appropriate procedures for evaluating these measures, and propose a new measure to assess the diversity. We evaluate the performance of WOCCE against some other traditional base algorithms as well as state-of-the-art ensemble methods. The results demonstrate the efficiency of WOCCE's aggregate decision-making compared to other algorithms.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.3233/IDA-150728

1605.04074

Country:

North America > United States (1.00)
Asia > Middle East > Iran (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Unsupervised Semantic Action Discovery from Video Collections

Sener, Ozan, Zamir, Amir Roshan, Wu, Chenxia, Savarese, Silvio, Saxena, Ashutosh

arXiv.org Machine LearningMay-11-2016

Human communication takes many forms, including speech, text and instructional videos. It typically has an underlying structure, with a starting point, ending, and certain objective steps between them. In this paper, we consider instructional videos where there are tens of millions of them on the Internet. We propose a method for parsing a video into such semantic steps in an unsupervised way. Our method is capable of providing a semantic "storyline" of the video composed of its objective steps. We accomplish this using both visual and language cues in a joint generative model. Our method can also provide a textual description for each of the identified semantic steps and video segments. We evaluate our method on a large number of complex YouTube videos and show that our method discovers semantically correct instructions for a variety of tasks.

data mining, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

1605.03324

Country: North America > United States > California > Santa Clara County (0.28)

Genre:

Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (0.68)

Industry: Education > Educational Technology (0.95)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(5 more...)

Add feedback

An efficient K-means algorithm for Massive Data

Capó, Marco, Pérez, Aritz, Lozano, José Antonio

arXiv.org Machine LearningMay-10-2016

Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to ma- nipulate and analyze such information. Even though datasets have grown in size, the K-means algorithm remains as one of the most popular clustering methods, in spite of its dependency on the initial settings and high computational cost, especially in terms of distance computations. In this work, we propose an efficient approximation to the K-means problem intended for massive data. Our approach recursively partitions the entire dataset into a small number of sub- sets, each of which is characterized by its representative (center of mass) and weight (cardinality), afterwards a weighted version of the K-means algorithm is applied over such local representation, which can drastically reduce the number of distances computed. In addition to some theoretical properties, experimental results indicate that our method outperforms well-known approaches, such as the K-means++ and the minibatch K-means, in terms of the relation between number of distance computations and the quality of the approximation.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1605.02989

Country: Europe > Spain (0.28)

Genre: Workflow (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Clustering subgaussian mixtures by semidefinite programming

Mixon, Dustin G., Villar, Soledad, Ward, Rachel

arXiv.org Machine LearningMay-10-2016

We introduce a model-free relax-and-round algorithm for k-means clustering based on a semidefinite relaxation due to Peng and Wei. The algorithm interprets the SDP output as a denoised version of the original data and then rounds this output to a hard clustering. We provide a generic method for proving performance guarantees for this algorithm, and we analyze the algorithm in the context of subgaussian mixture models. We also study the fundamental limits of estimating Gaussian centers by k-means clustering in order to compare our approximation guarantee to the theoretically optimal k-means clustering solution.

artificial intelligence, denote, machine learning, (18 more...)

arXiv.org Machine Learning

1602.06612

Country: North America > United States (0.92)

Genre: Research Report (0.64)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

Oracle Based Active Set Algorithm for Scalable Elastic Net Subspace Clustering

You, Chong, Li, Chun-Guang, Robinson, Daniel P., Vidal, Rene

arXiv.org Machine LearningMay-9-2016

State-of-the-art subspace clustering methods are based on expressing each data point as a linear combination of other data points while regularizing the matrix of coefficients with $\ell_1$, $\ell_2$ or nuclear norms. $\ell_1$ regularization is guaranteed to give a subspace-preserving affinity (i.e., there are no connections between points from different subspaces) under broad theoretical conditions, but the clusters may not be connected. $\ell_2$ and nuclear norm regularization often improve connectivity, but give a subspace-preserving affinity only for independent subspaces. Mixed $\ell_1$, $\ell_2$ and nuclear norm regularizations offer a balance between the subspace-preserving and connectedness properties, but this comes at the cost of increased computational complexity. This paper studies the geometry of the elastic net regularizer (a mixture of the $\ell_1$ and $\ell_2$ norms) and uses it to derive a provably correct and scalable active set method for finding the optimal coefficients. Our geometric analysis also provides a theoretical justification and a geometric interpretation for the balance between the connectedness (due to $\ell_2$ regularization) and subspace-preserving (due to $\ell_1$ regularization) properties for elastic net subspace clustering. Our experiments show that the proposed active set method not only achieves state-of-the-art clustering performance, but also efficiently handles large-scale datasets.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

1605.02633

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Semi-Unsupervised Clustering Using Reinforcement Learning

Bose, Sourabh (The University of Texas at Arlington) | Huber, Manfred (The University of Texas at Arlington)

AAAI ConferencesMay-8-2016

Clusters defined over a dataset by unsupervised clustering often present groupings which differ from the expected solution. This is primarily the case when some scarce knowledge of the problem exists beforehand that partially identifies desired characteristics of clusters. However conventional clustering algorithms are not defined to expect any supervision from the external world, as they are supposed to be completely unsupervised. As a result they can not benefit or effectively take into account available information about the use or properties of the clusters. In this paper we propose a reinforcement learning approach to address this problem where existing, unmodified unsupervised clustering algorithms are augmented in a way that the available sparse information is utilized to achieve more appropriate clusters. Our model works with any clustering algorithm, but the input to the algorithm, instead of being the original dataset, is a scaled version of the same, where the scaling factors are determined by the reinforcement learning algorithm.

reinforcement learning, semi-unsupervised clustering

AAAI Conferences

The Twenty-Ninth International Flairs Conference

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Text Analysis 101; A Basic Understanding for Business Users: Clustering and Unsupervised Methods

@machinelearnbotMay-6-2016, 01:26:19 GMT

This blog was originally posted as part of our Text Analysis 101 blog series. It aims to explain how the classification of text works as part of Natural Language Processing. It was the second blog on harnessing Machine Learning (ML) in the form of Natural Language Processing (NLP) for the Automatic Classification of documents. By classifying text, we aim to assign a document or piece of text to one or more classes or categories making it easier to manage or sort. A Document Classifier often returns or assigns a category "label" or "code" to a document or piece of text.

machine learning, natural language, text classification, (15 more...)

@machinelearnbot

Country: North America > United States > New York (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.31)

Add feedback