AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Nice Generalization of the K-NN Clustering Algorithm -- Also Useful for Data Reduction

@machinelearnbotAug-15-2017, 18:53:44 GMT

You don't need to know K-NN to understand this article -- but click here if you want to learn more about it. You don't need a background in statistical science either. Let's describe this new algorithm and its various components, in simple English We are dealing here with a supervised learning problem, and more specifically, clustering (also called supervised classification.). In particular, we want to assign a class label to a new observation that does not belong to the training set. Instead of checking out individual points (the nearest neighbors) and using a majority (voting) rule to assign the new observation to a cluster based on nearest neighbor counts, we are checking out cliques of points, and focus on the nearest cliques rather than on the nearest points. The cliques considered here are defined by circles (in two dimensions) or spheres (in three dimensions.)

artificial intelligence, clique, machine learning, (11 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.41)

Add feedback

Nice Generalization of the K-NN Clustering Algorithm -- Also Useful for Data Reduction

@machinelearnbotAug-15-2017, 18:53:39 GMT

artificial intelligence, clique, machine learning, (11 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.42)

Add feedback

Comparing Distance Measurements with Python and SciPy

@machinelearnbotAug-15-2017, 15:20:09 GMT

Clustering, or cluster analysis, is used for analyzing data which does not include pre-labeled classes. Data instances are grouped together using the concept of maximizing intraclass similarity and minimizing the similarity between differing classes. This translates to the clustering algorithm identifying and grouping instances which are very similar, as opposed to ungrouped instances which are much less-similar to one another. As clustering does not require the pre-labeling of classes, it is a form of unsupervised learning. At the core of cluster analysis is the concept of measuring distances between a variety of different data point dimensions.

artificial intelligence, machine learning, similarity, (11 more...)

@machinelearnbot

Country: Oceania > Australia > Australian Capital Territory > Canberra (0.07)

Industry: Information Technology > Security & Privacy (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.42)

Add feedback

Machine Learning Applications in Credit Risk

#artificialintelligenceAug-13-2017, 15:30:31 GMT

Typical decisions: • Grant credit/not to new applicants • Increasing/Decreasing spending limits • Increasing/Decreasing lending rates • What new products can be given to existing applicants? Step 2: Assign every entity to its closest medoid (using the distance matrix we have calculated). Step 3: For each cluster, identify the observation that would yield the lowest average distance if it were to be re-assigned as the medoid. If so, make this observation the new medoid. Step 4: If at least one medoid has changes, return to step 2. Otherwise, end the algorithm.

artificial intelligence, machine learning, open-source software, (15 more...)

#artificialintelligence

Country:

North America > United States > New York (0.05)
North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > District of Columbia > Washington (0.05)
North America > United States > California > San Francisco County > San Francisco (0.05)

Industry:

Banking & Finance > Credit (0.67)
Banking & Finance > Loans (0.48)
Automobiles & Trucks > Manufacturer (0.48)
Banking & Finance > Risk Management (0.41)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.73)

Add feedback

Mahalanonbis Distance Informed by Clustering

Lahav, Almog, Talmon, Ronen, Kluger, Yuval

arXiv.org Machine LearningAug-13-2017

A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of meaningfulness is more prominent (e.g. the Euclidean distance between images). In this paper, we propose to exploit a property of high-dimensional data that is usually ignored - which is the structure stemming from the relationships between the coordinates. Specifically we show that organizing similar coordinates in clusters can be exploited for the construction of the Mahalanobis distance between samples. When the observable samples are generated by a nonlinear transformation of hidden variables, the Mahalanobis distance allows the recovery of the Euclidean distances in the hidden space.We illustrate the advantage of our approach on a synthetic example where the discovery of clusters of correlated coordinates improves the estimation of the principal directions of the samples. Our method was applied to real data of gene expression for lung adenocarcinomas (lung cancer). By using the proposed metric we found a partition of subjects to risk groups with a good separation between their Kaplan-Meier survival plot.

artificial intelligence, machine learning, principal direction, (18 more...)

arXiv.org Machine Learning

1708.03914

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Model-Based Multiple Instance Learning

Vo, Ba-Ngu, Phung, Dinh, Tran, Quang N., Vo, Ba-Tuong

arXiv.org Machine LearningAug-13-2017

While Multiple Instance (MI) data are point patterns -- sets or multi-sets of unordered points -- appropriate statistical point pattern models have not been used in MI learning. This article proposes a framework for model-based MI learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed.

data mining, machine learning, point pattern, (16 more...)

arXiv.org Machine Learning

1703.02155

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Co-Clustering Can Provide Industrial Data Pattern Discovery

#artificialintelligenceAug-12-2017, 10:30:15 GMT

In spite of the rapid development in data acquisition technology resulting in the explosive collection of acquired datasets, techniques such as data organization and classification, manipulation, and analysis of very large, diverse, heterogeneous datasets have only evolved modestly. This has led to hindrances in effective utility and better understanding of the acquired, large-scale data for knowledge discovery. In an industrial setting, an interesting visual from McKinsey illustrates that despite collecting data from tens of thousands of sensors, less than 1% is actually utilized. Data clustering is the classification of data objects into different groups (clusters) such that data objects in one group are similar together and dissimilar from another group. Typically, homogeneous data objects, i.e. data objects having the same data type, are grouped together using some of the well-known clustering algorithms.

artificial intelligence, data mining, machine learning, (14 more...)

#artificialintelligence

Industry: Media > Film (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.80)

Add feedback

Machine Learning: An In-Depth Guide – Unsupervised Learning, Related Fields, and Machine Learning in Practice

#artificialintelligenceAug-11-2017, 05:20:29 GMT

Welcome to the fifth and final article in a five-part series about machine learning. In this final article, we will revisit unsupervised learning in greater depth, briefly discuss other fields related to machine learning, and finish the series with some examples of real-world machine learning applications. Recall that unsupervised learning involves learning from data, but without the goal of prediction. This is because the data is either not given with a target response variable (label), or one chooses not to designate a response. It can also be used as a pre-processing step for supervised learning.

artificial intelligence, learning, machine learning, (13 more...)

#artificialintelligence

Country: North America > United States > Illinois > Cook County > Chicago (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.83)

Add feedback

Multilayer Spectral Graph Clustering via Convex Layer Aggregation: Theory and Algorithms

Chen, Pin-Yu, Hero, Alfred O.

arXiv.org Machine LearningAug-8-2017

Multilayer graphs provide a framework for representing multiple types of relations between entities, represented as nodes. In a multilayer graph each layer describes a specific type of relation among pairs of nodes that are shared across layers. For example, in multi-relational social networks, two layers might correspond to friendship relations and business relations, respectively. In temporal networks, each layer might correspond to a snapshot of the entire network at a sampled time instant. Multilayer graphs can be incorporated into in many signal processing and data mining techniques, including inference of mixture models [1], [2], tensor decomposition [3], information extraction [4], multi-view learning and processing [5], graph wavelet transforms [6], principal component analysis and dictionary learning [7], [8], anomaly detection [9], and community detection [10], [11], among others. The objective of multilayer graph clustering is to find a consensus cluster assignment on each node in the common node set by combining connectivity patterns in each layer.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

1708.0262

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.64)

Industry:

Information Technology (0.88)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

[P] KMin - Clustering algorithm • r/MachineLearning

@machinelearnbotAug-7-2017, 17:10:01 GMT

In cases where an L1-norm or L-infinity norm better describe distance, this could be useful. For example, dealing with a square-grid pattern in city streets may yield better results when using scaled geographic coordinates. K-means is effectively an algorithm that considers all points around each cluster center to be distributed around that point according to an N-dimensional normal distribution with a constant diagonal and no correlations. This works well when your clusters can be approximated to be roughly a circular shape (which corresponds to the L2 norm of Euclidean space). If your cluster patterns were squares, cubes or hypercubes, this would work better for an L-infinity norm, and likewise diamond shapes would work better with an L1-norm.

machine learning, machinelearning, social media, (4 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.40)

Add feedback