AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Latent structure blockmodels for Bayesian spectral graph clustering

Passino, Francesco Sanna, Heard, Nicholas A.

arXiv.org Machine LearningJul-4-2021

Spectral embedding of network adjacency matrices often produces node representations living approximately around low-dimensional submanifold structures. In particular, hidden substructure is expected to arise when the graph is generated from a latent position model. Furthermore, the presence of communities within the network might generate community-specific submanifold structures in the embedding, but this is not explicitly accounted for in most statistical models for networks. In this article, a class of models called latent structure block models (LSBM) is proposed to address such scenarios, allowing for graph clustering when community-specific one dimensional manifold structure is present. LSBMs focus on a specific class of latent space model, the random dot product graph (RDPG), and assign a latent submanifold to the latent positions of each community. A Bayesian model for the embeddings arising from LSBMs is discussed, and shown to have a good performance on simulated and real world network data. The model is able to correctly recover the underlying communities living in a one-dimensional manifold, even when the parametric form of the underlying curves is unknown, achieving remarkable results on a variety of real data.

blockmodel, graph, latent position, (13 more...)

arXiv.org Machine Learning

2107.01734

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Information Technology (0.48)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Clustering in Machine Learning - GeeksforGeeks

#artificialintelligenceJul-1-2021, 22:31:19 GMT

It is basically a type of unsupervised learning method . An unsupervised learning method is a method in which we draw references from datasets consisting of input data without labelled responses. Generally, it is used as a process to find meaningful structure, explanatory underlying processes, generative features, and groupings inherent in a set of examples. Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them. For ex– The data points in the graph below clustered together can be classified into one single group. We can distinguish the clusters, and we can identify that there are 3 clusters in the below picture. It is not necessary for clusters to be a spherical.

clustering, geeksforgeek, machine learning, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.78)

Add feedback

Few-shot Learning for Unsupervised Feature Selection

Kumagai, Atsutoshi, Iwata, Tomoharu, Fujiwara, Yasuhiro

arXiv.org Machine LearningJul-1-2021

We propose a few-shot learning method for unsupervised feature selection, which is a task to select a subset of relevant features in unlabeled data. Existing methods usually require many instances for feature selection. However, sufficient instances are often unavailable in practice. The proposed method can select a subset of relevant features in a target task given a few unlabeled target instances by training with unlabeled instances in multiple source tasks. Our model consists of a feature selector and decoder. The feature selector outputs a subset of relevant features taking a few unlabeled instances as input such that the decoder can reconstruct the original features of unseen instances from the selected ones. The feature selector uses the Concrete random variables to select features via gradient descent. To encode task-specific properties from a few unlabeled instances to the model, the Concrete random variables and decoder are modeled using permutation-invariant neural networks that take a few unlabeled instances as input. Our model is trained by minimizing the expected test reconstruction error given a few unlabeled instances that is calculated with datasets in source tasks. We experimentally demonstrate that the proposed method outperforms existing feature selection methods.

dataset, feature selection, selection, (14 more...)

arXiv.org Machine Learning

2107.00816

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Foundations of Data Science: K-Means Clustering in Python

#artificialintelligenceJun-30-2021, 09:50:42 GMT

This Course Organisations all around the world are using data to predict behaviours and extract valuable real-world insights to inform decisions. Managing and analysing big data has become an essential part of modern finance, retail, marketing, social science, development and research, medicine and government. This MOOC, designed by an academic team from Goldsmiths, University of London, will quickly introduce you to the core concepts of Data Science to prepare you for intermediate and advanced Data Science courses. It focuses on the basic mathematics, statistics and programming skills that are necessary for typical data analysis tasks. You will consider these fundamental concepts on an example data clustering task, and you will use this example to learn basic programming skills that are necessary for mastering Data Science techniques.

data science, foundation, k-means clustering, (2 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.66)

Industry: Education (0.92)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

Fact Check: Analyzing Financial Events from Multilingual News Sources

Yang, Linyi, Ng, Tin Lok James, Smyth, Barry, Dong, Ruihai

arXiv.org Artificial IntelligenceJun-30-2021

The explosion in the sheer magnitude and complexity of financial news data in recent years makes it increasingly challenging for investment analysts to extract valuable insights and perform analysis. We propose FactCheck in finance, a web-based news aggregator with deep learning models, to provide analysts with a holistic view of important financial events from multilingual news sources and extract events using an unsupervised clustering method. A web interface is provided to examine the credibility of news articles using a transformer-based fact-checker. The performance of the fact checker is evaluated using a dataset related to merger and acquisition (M\&A) events and is shown to outperform several strong baselines.

analyzing financial event, fact check, multilingual news source

arXiv.org Artificial Intelligence

2106.15221

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.53)

Add feedback

Enhancing the Analysis of Software Failures in Cloud Computing Systems with Deep Learning

Cotroneo, Domenico, De Simone, Luigi, Liguori, Pietro, Natella, Roberto

arXiv.org Artificial IntelligenceJun-29-2021

Identifying the failure modes of cloud computing systems is a difficult and time-consuming task, due to the growing complexity of such systems, and the large volume and noisiness of failure data. This paper presents a novel approach for analyzing failure data from cloud systems, in order to relieve human analysts from manually fine-tuning the data for feature engineering. The approach leverages Deep Embedded Clustering (DEC), a family of unsupervised clustering algorithms based on deep learning, which uses an autoencoder to optimize data dimensionality and inter-cluster variance. We applied the approach in the context of the OpenStack cloud computing platform, both on the raw failure data and in combination with an anomaly detection pre-processing algorithm. The results show that the performance of the proposed approach, in terms of purity of clusters, is comparable to, or in some cases even better than manually fine-tuned clustering, thus avoiding the need for deep domain knowledge and reducing the effort to perform the analysis. In all cases, the proposed approach provides better performance than unsupervised clustering when no feature engineering is applied to the data. Moreover, the distribution of failure modes from the proposed approach is closer to the actual frequency of the failure modes.

cloud computing system, deep learning, software failure, (1 more...)

arXiv.org Artificial Intelligence

2106.15182

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Everything on Hierarchical Clustering

#artificialintelligenceJun-28-2021, 14:50:16 GMT

In this article, you will learn. Clustering is the most common form of unsupervised learning on unlabeled data to clusters objects with common characteristics into discrete clusters based on a distance measure. Hierarchical Clustering is either bottom-up, referred to as Agglomerative clustering, or Divisive, which uses a top-down approach. A bottom-up approach where each data point is considered a singleton cluster at the start, clusters are iteratively merged based on similarity until all data points have merged into one cluster. Agglomerative clustering agglomerates pairs of clusters based on maximum similarity calculated using distance metrics to obtain a new cluster, thus reducing the number of clusters with every iteration.

dendrogram, hierarchical, hierarchical clustering, (15 more...)

#artificialintelligence

Country: North America > United States > California > Santa Clara County > Palo Alto (0.06)

Genre: Instructional Material (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

K-Means Clustering: Techniques to Find the Optimal Clusters

#artificialintelligenceJun-24-2021, 02:50:20 GMT

As the points are uniformly distributed, the KMeans algorithm evenly splits the points into K clusters even if there's no separation between them Gap Statistics gives the optimal number of the cluster as 10 based on the maximum gap between the cluster inertia on the data and null referenced data.

centroid, dataset, optimal cluster, (13 more...)

#artificialintelligence

Country: North America > United States > California > Santa Clara County > Palo Alto (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Exploring the Representational Power of Graph Autoencoder

Haddad, Maroun, Bouguessa, Mohamed

arXiv.org Artificial IntelligenceJun-22-2021

While representation learning has yielded a great success on many graph learning tasks, there is little understanding behind the structures that are being captured by these embeddings. For example, we wonder if the topological features, such as the Triangle Count, the Degree of the node and other centrality measures are concretely encoded in the embeddings. Furthermore, we ask if the presence of these structures in the embeddings is necessary for a better performance on the downstream tasks, such as clustering and classification. To address these questions, we conduct an extensive empirical study over three classes of unsupervised graph embedding models and seven different variants of Graph Autoencoders. Our results show that five topological features: the Degree, the Local Clustering Score, the Betweenness Centrality, the Eigenvector Centrality, and Triangle Count are concretely preserved in the first layer of the graph autoencoder that employs the SUM aggregation rule, under the condition that the model preserves the second-order proximity. We supplement further evidence for the presence of these features by revealing a hierarchy in the distribution of the topological features in the embeddings of the aforementioned model. We also show that a model with such properties can outperform other models on certain downstream tasks, especially when the preserved features are relevant to the task at hand. Finally, we evaluate the suitability of our findings through a test case study related to social influence prediction.

graph, node, topological feature, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.neucom.2021.06.034

2106.12005

Country:

South America > Brazil (0.04)
Europe (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Transportation (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Fine grained analysis of K- mean clustering and where we are using it

#artificialintelligenceJun-21-2021, 02:01:10 GMT

You can transform data for multiple features to the same scale by normalizing the data. In particular, normalization is well-suited to processing the most common data distribution, the Gaussian distribution. Compared to quantiles, normalization requires significantly less data to calculate.

feature data, similarity, similarity measure, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback