AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

8 Clustering Algorithms in Machine Learning that All Data Scientists Should Know

#artificialintelligenceSep-22-2020, 00:40:49 GMT

There are three different approaches to machine learning, depending on the data you have. You can go with supervised learning, semi-supervised learning, or unsupervised learning. In supervised learning you have labeled data, so you have outputs that you know for sure are the correct values for your inputs. That's like knowing car prices based on features like make, model, style, drivetrain, and other attributes. With semi-supervised learning, you have a large data set where some of the data is labeled but most of it isn't. This covers a large amount of real world data because it can be expensive to get an expert to label every data point.

algorithm, artificial intelligence, machine learning, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Privacy Preserving K-Means Clustering: A Secure Multi-Party Computation Approach

Ramírez, Daniel Hurtado, Auñón, J. M.

arXiv.org Machine LearningSep-22-2020

Knowledge discovery is one of the main goals of Artificial Intelligence. This Knowledge is usually stored in databases spread in different environments, being a tedious (or impossible) task to access and extract data from them. To this difficulty we must add that these datasources may contain private data, therefore the information can never leave the source. Privacy Preserving Machine Learning (PPML) helps to overcome this difficulty, employing cryptographic techniques, allowing knowledge discovery while ensuring data privacy. K-means is one of the data mining techniques used in order to discover knowledge, grouping data points in clusters that contain similar features. This paper focuses in Privacy Preserving Machine Learning applied to K-means using recent protocols from the field of criptography. The algorithm is applied to different scenarios where data may be distributed either horizontally or vertically.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2009.10453

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.85)

Add feedback

Explainable, Stable, and Scalable Graph Convolutional Networks for Learning Graph Representation

Lu, Ping-En, Chang, Cheng-Shang

arXiv.org Machine LearningSep-22-2020

The network embedding problem that maps nodes in a graph to vectors in Euclidean space can be very useful for addressing several important tasks on a graph. Recently, graph neural networks (GNNs) have been proposed for solving such a problem. However, most embedding algorithms and GNNs are difficult to interpret and do not scale well to handle millions of nodes. In this paper, we tackle the problem from a new perspective based on the equivalence of three constrained optimization problems: the network embedding problem, the trace maximization problem of the modularity matrix in a sampled graph, and the matrix factorization problem of the modularity matrix in a sampled graph. The optimal solutions to these three problems are the dominant eigenvectors of the modularity matrix. We proposed two algorithms that belong to a special class of graph convolutional networks (GCNs) for solving these problems: (i) Clustering As Feature Embedding GCN (CAFE-GCN) and (ii) sphere-GCN. Both algorithms are stable trace maximization algorithms, and they yield good approximations of dominant eigenvectors. Moreover, there are linear-time implementations for sparse graphs. In addition to solving the network embedding problem, both proposed GCNs are capable of performing dimensionality reduction. Various experiments are conducted to evaluate our proposed GCNs and show that our proposed GCNs outperform almost all the baseline methods. Moreover, CAFE-GCN could be benefited from the labeled data and have tremendous improvements in various performance metrics.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2009.10367

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Convex Subspace Clustering by Adaptive Block Diagonal Representation

Lin, Yunxia, Chen, Songcan

arXiv.org Machine LearningSep-22-2020

Subspace clustering is a class of extensively studied clustering methods and the spectral-type approaches are its important subclass whose key first step is to learn a coefficient matrix with block diagonal structure. To realize this step, sparse subspace clustering (SSC), low rank representation (LRR) and block diagonal representation (BDR) were successively proposed and have become the state-of-the-arts (SOTAs). Among them, the former two minimize their convex objectives by imposing sparsity and low rankness on the coefficient matrix respectively, but so-desired block diagonality cannot neccesarily be guaranteed practically while the latter designs a block diagonal matrix induced regularizer but sacrifices convexity. For solving this dilemma, inspired by Convex Biclustering, in this paper, we propose a simple yet efficient spectral-type subspace clustering method named Adaptive Block Diagonal Representation (ABDR) which strives to pursue so-desired block diagonality as BDR by coercively fusing the columns/rows of the coefficient matrix via a specially designed convex regularizer, consequently, ABDR naturally enjoys their merits and can adaptively form more desired block diagonality than the SOTAs without needing to prefix the number of blocks as done in BDR. Finally, experimental results on synthetic and real benchmarks demonstrate the superiority of ABDR.

artificial intelligence, machine learning, subspace, (12 more...)

arXiv.org Machine Learning

2009.09386

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.56)

Add feedback

12 Cool Data Science Projects Ideas for Beginners and Experts

#artificialintelligenceSep-21-2020, 10:55:05 GMT

Chatbots play a pivotal role for businesses as they can effortlessly handle a barrage of customer queries and messages without any slowdown. They have single-handedly reduced the customer service workload for us by automating a majority of the process. They do this by utilizing techniques backed with Artificial Intelligence, Machine Learning, and Data Science. Chatbots work by analyzing the input from the customer and replying with an appropriate mapped response. To train the chatbot, you can use Recurrent Neural Networks with the intents JSON dataset while the implementation can be handled using Python.

artificial intelligence, deep learning, machine learning, (13 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering

Srinivas, Aravind, Krishnamurthy, Ramnandan, Kumar, Peeyush, Ravindran, Balaraman

arXiv.org Artificial IntelligenceSep-21-2020

This paper introduces an automated skill acquisition framework in reinforcement learning which involves identifying a hierarchical description of the given task in terms of abstract states and extended actions between abstract states. Identifying such structures present in the task provides ways to simplify and speed up reinforcement learning algorithms. These structures also help to generalize such algorithms over multiple tasks without relearning policies from scratch. We use ideas from dynamical systems to find metastable regions in the state space and associate them with abstract states. The spectral clustering algorithm PCCA+ is used to identify suitable abstractions aligned to the underlying structure. Skills are defined in terms of the sequence of actions that lead to transitions between such abstract states. The connectivity information from PCCA+ is used to generate these skills or options. These skills are independent of the learning task and can be efficiently reused across a variety of tasks defined over the same model. This approach works well even without the exact model of the environment by using sample trajectories to construct an approximate estimate. We also present our approach to scaling the skill acquisition framework to complex tasks with large state spaces for which we perform state aggregation using the representation learned from an action conditional video prediction network and use the skill acquisition framework on the aggregated state space.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1605.05359

Country: North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (0.64)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Graph Based Multi-layer K-means++ (G-MLKM) for Sensory Pattern Analysis in Constrained Spaces

Tao, Feng, Suresh, Rengan, Votion, Johnathan, Cao, Yongcan

arXiv.org Machine LearningSep-21-2020

In this paper, we focus on developing a novel unsupervised machine learning algorithm, named graph based multi-layer k-means++ (G-MLKM), to solve data-target association problem when targets move on a constrained space and minimal information of the targets can be obtained by sensors. Instead of employing the traditional data-target association methods that are based on statistical probabilities, the G-MLKM solves the problem via data clustering. We first will develop the Multi-layer K-means++ (MLKM) method for data-target association at local space given a simplified constrained space situation. Then a p-dual graph is proposed to represent the general constrained space when local spaces are interconnected. Based on the dual graph and graph theory, we then generalize MLKM to G-MLKM by first understanding local data-target association and then extracting cross-local data-target association mathematically analyze the data association at intersections of that space. To exclude potential data-target association errors that disobey physical rules, we also develop error correction mechanisms to further improve the accuracy. Numerous simulation examples are conducted to demonstrate the performance of G-MLKM.

artificial intelligence, machine learning, road segment, (17 more...)

arXiv.org Machine Learning

2009.09925

Country:

North America > United States > Texas > Bexar County > San Antonio (0.14)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.40)

Industry:

Transportation > Infrastructure & Services (0.39)
Transportation > Ground > Road (0.39)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Contrastive Clustering

Li, Yunfan, Hu, Peng, Liu, Zitao, Peng, Dezhong, Zhou, Joey Tianyi, Peng, Xi

arXiv.org Machine LearningSep-21-2020

In this paper, we propose a one-stage online clustering method called Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning. To be specific, for a given dataset, the positive and negative instance pairs are constructed through data augmentations and then projected into a feature space. Therein, the instance- and cluster-level contrastive learning are respectively conducted in the row and column space by maximizing the similarities of positive pairs while minimizing those of negative ones. Our key observation is that the rows of the feature matrix could be regarded as soft labels of instances, and accordingly the columns could be further regarded as cluster representations. By simultaneously optimizing the instance- and cluster-level contrastive loss, the model jointly learns representations and cluster assignments in an end-to-end manner. Extensive experimental results show that CC remarkably outperforms 17 competitive clustering methods on six challenging image benchmarks. In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19\% (39\%) performance improvement compared with the best baseline.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Machine Learning

2009.09687

Country:

North America > United States > California > Alameda County > Oakland (0.04)
Asia > Singapore (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Interactive Steering of Hierarchical Clustering

Yang, Weikai, Wang, Xiting, Lu, Jie, Dou, Wenwen, Liu, Shixia

arXiv.org Machine LearningSep-21-2020

Hierarchical clustering is an important technique to organize big data for exploratory data analysis. However, existing one-size-fits-all hierarchical clustering methods often fail to meet the diverse needs of different users. To address this challenge, we present an interactive steering method to visually supervise constrained hierarchical clustering by utilizing both public knowledge (e.g., Wikipedia) and private knowledge from users. The novelty of our approach includes 1) automatically constructing constraints for hierarchical clustering using knowledge (knowledge-driven) and intrinsic data distribution (data-driven), and 2) enabling the interactive steering of clustering through a visual interface (user-driven). Our method first maps each data item to the most relevant items in a knowledge base. An initial constraint tree is then extracted using the ant colony optimization algorithm. The algorithm balances the tree width and depth and covers the data items with high confidence. Given the constraint tree, the data items are hierarchically clustered using evolutionary Bayesian rose tree. To clearly convey the hierarchical clustering results, an uncertainty-aware tree visualization has been developed to enable users to quickly locate the most uncertain sub-hierarchies and interactively improve them. The quantitative evaluation and case study demonstrate that the proposed approach facilitates the building of customized clustering trees in an efficient and effective manner.

artificial intelligence, hierarchy, machine learning, (15 more...)

arXiv.org Machine Learning

doi: 10.1109/TVCG.2020.2995100

2009.09618

Country:

North America > United States > North Carolina (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (1.00)

Industry:

Education > Educational Setting > Higher Education (0.67)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Overlapping community detection in networks via sparse spectral decomposition

Arroyo, Jesús, Levina, Elizaveta

arXiv.org Machine LearningSep-20-2020

We consider the problem of estimating overlapping community memberships in a network, where each node can belong to multiple communities. More than a few communities per node are difficult to both estimate and interpret, so we focus on sparse node membership vectors. Our algorithm is based on sparse principal subspace estimation with iterative thresholding. The method is computationally efficient, with a computational cost equivalent to estimating the leading eigenvectors of the adjacency matrix, and does not require an additional clustering step, unlike spectral clustering methods. We show that a fixed point of the algorithm corresponds to correct node memberships under a version of the stochastic block model. The methods are evaluated empirically on simulated and real-world networks, showing good statistical performance and computational efficiency.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2009.10641

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry:

Government (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback