AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Community Detection by Principal Components Clustering Methods

Qing, Huan, Wang, Jingli

arXiv.org Machine LearningNov-9-2020

Based on the classical Degree Corrected Stochastic Blockmodel (DCSBM) model for network community detection problem, we propose two novel approaches: principal component clustering (PCC) and normalized principal component clustering (NPCC). Without any parameters to be estimated, the PCC method is simple to be implemented. Under mild conditions, we show that PCC yields consistent community detection. NPCC is designed based on the combination of the PCC and the RSC method (Qin & Rohe 2013). Population analysis for NPCC shows that NPCC returns perfect clustering for the ideal case under DCSBM. PCC and NPCC is illustrated through synthetic and real-world datasets. Numerical results show that NPCC provides a significant improvement compare with PCC and RSC. Moreover, NPCC inherits nice properties of PCC and RSC such that NPCC is insensitive to the number of eigenvectors to be clustered and the choosing of the tuning parameter. When dealing with two weak signal networks Simmons and Caltech, by considering one more eigenvectors for clustering, we provide two refinements PCC+ and NPCC+ of PCC and NPCC, respectively. Both two refinements algorithms provide improvement performances compared with their original algorithms. Especially, NPCC+ provides satisfactory performances on Simmons and Caltech, with error rates of 121/1137 and 96/590, respectively.

eigenvector, matrix, npcc, (15 more...)

arXiv.org Machine Learning

2011.04377

Country:

North America > United States (0.46)
Asia > China (0.04)
Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.65)

Add feedback

How to Build Audience Clusters With Website Data Using BigQuery ML

#artificialintelligenceNov-8-2020, 13:12:43 GMT

A common marketing analytics challenge is to understand consumer behavior and develop customer attributes or archetypes. As organizations get better at tackling this problem, they can activate marketing strategies to incorporate additional customer knowledge into their campaigns. Building customer profiles is now easier than ever with BigQuery ML, using a technique called clustering. In this post, you'll learn how to create segmentation and how to use these audiences for marketing activation. Clustering algorithms can group similar user behavior together to build segmentation used for marketing.

algorithm, bigquery ml, google analytic 360, (13 more...)

#artificialintelligence

Industry: Information Technology > Services (0.67)

Technology:

Information Technology > Enterprise Applications > Customer Relationship Management (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.38)

Add feedback

Learning Online Data Association

Du, Yilun, Tenenbaum, Joshua, Lozano-Perez, Tomas, Kaelbling, Leslie

arXiv.org Artificial IntelligenceNov-5-2020

When an agent interacts with a complex environment, it receives a stream of percepts in which it may detect entities, such as objects or people. To build up a coherent, low-variance estimate of the underlying state, it is necessary to fuse information from multiple detections over time. To do this fusion, the agent must decide which detections to associate with one another. We address this data-association problem in the setting of an online filter, in which each observation is processed by aggregating into an existing object hypothesis. Classic methods with strong probabilistic foundations exist, but they are computationally expensive and require models that can be difficult to acquire. In this work, we use the deep-learning tools of sparse attention and representation learning to learn a machine that processes a stream of detections and outputs a set of hypotheses about objects in the world. We evaluate this approach on simple clustering problems, problems with dynamics, and a complex image-based domain. We find that it generalizes well from short to long observation sequences and from a few to many hypotheses, outperforming other learning approaches and classical non-learning methods.

daf-net, hypothesis, hypothesis slot, (14 more...)

arXiv.org Artificial Intelligence

2011.03183

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre:

Research Report (0.50)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

Mixing Consistent Deep Clustering

Lutscher, Daniel, Hassouni, Ali el, Stol, Maarten, Hoogendoorn, Mark

arXiv.org Machine LearningNov-3-2020

Finding well-defined clusters in data represents a fundamental challenge for many data-driven applications, and largely depends on good data representation. Drawing on literature regarding representation learning, studies suggest that one key characteristic of good latent representations is the ability to produce semantically mixed outputs when decoding linear interpolations of two latent representations. We propose the Mixing Consistent Deep Clustering method which encourages interpolations to appear realistic while adding the constraint that interpolations of two data points must look like one of the two inputs. By applying this training method to various clustering (non-)specific autoencoder models we found that using the proposed training method systematically changed the structure of learned representations of a model and it improved clustering performance for the tested ACAI, IDEC, and VAE models on the MNIST, SVHN, and CIFAR-10 datasets. These outcomes have practical implications for numerous real-world clustering tasks, as it shows that the proposed method can be added to existing autoencoders to further improve clustering performance.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Machine Learning

2011.01977

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.05)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Regularized spectral methods for clustering signed networks

Cucuringu, Mihai, Singh, Apoorv Vikram, Sulem, Déborah, Tyagi, Hemant

arXiv.org Machine LearningNov-3-2020

We study the problem of $k$-way clustering in signed graphs. Considerable attention in recent years has been devoted to analyzing and modeling signed graphs, where the affinity measure between nodes takes either positive or negative values. Recently, Cucuringu et al. [CDGT 2019] proposed a spectral method, namely SPONGE (Signed Positive over Negative Generalized Eigenproblem), which casts the clustering task as a generalized eigenvalue problem optimizing a suitably defined objective function. This approach is motivated by social balance theory, where the clustering task aims to decompose a given network into disjoint groups, such that individuals within the same group are connected by as many positive edges as possible, while individuals from different groups are mainly connected by negative edges. Through extensive numerical simulations, SPONGE was shown to achieve state-of-the-art empirical performance. On the theoretical front, [CDGT 2019] analyzed SPONGE and the popular Signed Laplacian method under the setting of a Signed Stochastic Block Model (SSBM), for $k=2$ equal-sized clusters, in the regime where the graph is moderately dense. In this work, we build on the results in [CDGT 2019] on two fronts for the normalized versions of SPONGE and the Signed Laplacian. Firstly, for both algorithms, we extend the theoretical analysis in [CDGT 2019] to the general setting of $k \geq 2$ unequal-sized clusters in the moderately dense regime. Secondly, we introduce regularized versions of both methods to handle sparse graphs -- a regime where standard spectral methods underperform -- and provide theoretical guarantees under the same SSBM model. To the best of our knowledge, regularized spectral methods have so far not been considered in the setting of clustering signed graphs. We complement our theoretical results with an extensive set of numerical experiments on synthetic data.

artificial intelligence, machine learning, sym, (18 more...)

arXiv.org Machine Learning

2011.01737

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.45)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)

Add feedback

Erdos Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs

Karalias, Nikolaos, Loukas, Andreas

arXiv.org Machine LearningNov-3-2020

Combinatorial optimization problems are notoriously challenging for neural networks, especially in the absence of labeled instances. This work proposes an unsupervised learning framework for CO problems on graphs that can provide integral solutions of certified quality. Inspired by Erdos' probabilistic method, we use a neural network to parametrize a probability distribution over sets. Crucially, we show that when the network is optimized w.r.t. a suitably chosen loss, the learned distribution contains, with controlled probability, a low-cost integral solution that obeys the constraints of the combinatorial problem. The probabilistic proof of existence is then derandomized to decode the desired solutions. We demonstrate the efficacy of this approach to obtain valid solutions to the maximum clique problem and to perform local graph clustering. Our method achieves competitive results on both real datasets and synthetic hard instances.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Machine Learning

2006.10643

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
(2 more...)

Add feedback

Kernel Two-Dimensional Ridge Regression for Subspace Clustering

Peng, Chong, Zhang, Qian, Kang, Zhao, Chen, Chenglizhao, Cheng, Qiang

arXiv.org Artificial IntelligenceNov-2-2020

Subspace clustering methods have been widely studied recently. When the inputs are 2-dimensional (2D) data, existing subspace clustering methods usually convert them into vectors, which severely damages inherent structures and relationships from original data. In this paper, we propose a novel subspace clustering method for 2D data. It directly uses 2D data as inputs such that the learning of representations benefits from inherent structures and relationships of the data. It simultaneously seeks image projection and representation coefficients such that they mutually enhance each other and lead to powerful data representations. An efficient algorithm is developed to solve the proposed objective function with provable decreasing and convergence property. Extensive experimental results verify the effectiveness of the new method.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2011.01477

Country:

North America > United States > Kentucky (0.04)
Asia > China > Shandong Province > Qingdao (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

An Important Guide To Unsupervised Machine Learning

#artificialintelligenceNov-1-2020, 18:25:24 GMT

We're living in an era of digital switch-over with only one constant – evolve. And that digital transformation is being introduced by high-tech solutions. Hence, it comes as no surprise that mundane business tasks are being completely taken over by tech advancements. Machines, artificial intelligence (AI), and unsupervised learning are reshaping the way businesses vie for a place under the sun. With that being said, let's have a closer look at how unsupervised machine learning is omnipresent in all industries.

algorithm, artificial intelligence, machine learning, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Understanding K-means Clustering in Machine Learning

#artificialintelligenceOct-31-2020, 08:25:35 GMT

K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. Typically, unsupervised algorithms make inferences from datasets using only input vectors without referring to known, or labelled, outcomes. AndreyBu, who has more than 5 years of machine learning experience and currently teaches people his skills, says that "the objective of K-means is simple: group similar data points together and discover underlying patterns. To achieve this objective, K-means looks for a fixed number (k) of clusters in a dataset." A cluster refers to a collection of data points aggregated together because of certain similarities. You'll define a target number k, which refers to the number of centroids you need in the dataset.

artificial intelligence, k-means, machine learning, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.99)

Add feedback

A Manifold Proximal Linear Method for Sparse Spectral Clustering with Application to Single-Cell RNA Sequencing Data Analysis

Wang, Zhongruo, Liu, Bingyuan, Chen, Shixiang, Ma, Shiqian, Xue, Lingzhou, Zhao, Hongyu

arXiv.org Machine LearningOct-30-2020

Spectral clustering is one of the fundamental unsupervised learning methods widely used in data analysis. Sparse spectral clustering (SSC) imposes sparsity to the spectral clustering and it improves the interpretability of the model. This paper considers a widely adopted model for SSC, which can be formulated as an optimization problem over the Stiefel manifold with nonsmooth and nonconvex objective. Such an optimization problem is very challenging to solve. Existing methods usually solve its convex relaxation or need to smooth its nonsmooth part using certain smoothing techniques. In this paper, we propose a manifold proximal linear method (ManPL) that solves the original SSC formulation. We also extend the algorithm to solve the multiple-kernel SSC problems, for which an alternating ManPL algorithm is proposed. Convergence and iteration complexity results of the proposed methods are established. We demonstrate the advantage of our proposed methods over existing methods via the single-cell RNA sequencing data analysis.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

2007.09524

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Texas (0.04)
North America > United States > Pennsylvania (0.04)
(3 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback