AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Clustering of Nonnegative Data and an Application to Matrix Completion

Strohmeier, C., Needell, D.

arXiv.org Machine LearningSep-2-2020

Clustering is another typical problem in data science whose aim is to cluster, or group, unlabeled data. That is, In this paper, we propose a simple algorithm to cluster nonnegative one has a data set consisting of two or more families of data data lying in disjoint subspaces. We analyze its performance points such that members of each family share intrinsic characteristics. in relation to a certain measure of correlation between Based on these intrinsic characteristics, one must said subspaces. We use our clustering algorithm to develop sort the data into its different families. There are now many a matrix completion algorithm which can outperform methods to cluster data along with a wide array of theoretical standard matrix completion algorithms on data matrices satisfying and empirical support, see e.g.

artificial intelligence, completion, machine learning, (16 more...)

arXiv.org Machine Learning

2009.01279

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

Add feedback

Rank-one partitioning: formalization, illustrative examples, and a new cluster enhancing strategy

Laclau, Charlotte, Iutzeler, Franck, Redko, Ievgen

arXiv.org Machine LearningSep-1-2020

In this paper, we introduce and formalize a rank-one partitioning learning paradigm that unifies partitioning methods that proceed by summarizing a data set using a single vector that is further used to derive the final clustering partition. Using this unification as a starting point, we propose a novel algorithmic solution for the partitioning problem based on rank-one matrix factorization and denoising of piecewise constant signals. Finally, we propose an empirical demonstration of our findings and demonstrate the robustness of the proposed denoising step. We believe that our work provides a new point of view for several unsupervised learning techniques that helps to gain a deeper understanding about the general mechanisms of data partitioning.

artificial intelligence, machine learning, vector, (19 more...)

arXiv.org Machine Learning

2009.00365

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Structured Graph Learning for Clustering and Semi-supervised Classification

Kang, Zhao, Peng, Chong, Cheng, Qiang, Liu, Xinwang, Peng, Xi, Xu, Zenglin, Tian, Ling

arXiv.org Artificial IntelligenceAug-31-2020

Graphs have become increasingly popular in modeling structures and interactions in a wide variety of problems during the last decade. Graph-based clustering and semi-supervised classification techniques have shown impressive performance. This paper proposes a graph learning framework to preserve both the local and global structure of data. Specifically, our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure. Furthermore, most existing graph-based methods conduct clustering and semi-supervised classification on the graph learned from the original data matrix, which doesn't have explicit cluster structure, thus they might not achieve the optimal performance. By considering rank constraint, the achieved graph will have exactly $c$ connected components if there are $c$ clusters or classes. As a byproduct of this, graph learning and label inference are jointly and iteratively implemented in a principled way. Theoretically, we show that our model is equivalent to a combination of kernel k-means and k-means methods under certain condition. Extensive experiments on clustering and semi-supervised classification demonstrate that the proposed method outperforms other state-of-the-art methods.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2008.13429

Country:

North America > United States > Kentucky > Fayette County > Lexington (0.14)
Asia > China > Sichuan Province > Chengdu (0.04)
Asia > China > Shandong Province > Qingdao (0.04)
(4 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Multi-View Spectral Clustering with High-Order Optimal Neighborhood Laplacian Matrix

Liang, Weixuan, Zhou, Sihang, Xiong, Jian, Liu, Xinwang, Wang, Siwei, Zhu, En, Cai, Zhiping, Xu, Xin

arXiv.org Machine LearningAug-31-2020

Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data by performing clustering on the learned optimal embedding across views. Though demonstrating promising performance in various applications, most of existing methods usually linearly combine a group of pre-specified first-order Laplacian matrices to construct the optimal Laplacian matrix, which may result in limited representation capability and insufficient information exploitation. Also, storing and implementing complex operations on the $n\times n$ Laplacian matrices incurs intensive storage and computation complexity. To address these issues, this paper first proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix, and then extends it to the late fusion version for accurate and efficient multi-view clustering. Specifically, our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base Laplacian matrices simultaneously. By this way, the representative capacity of the learned optimal Laplacian matrix is enhanced, which is helpful to better utilize the hidden high-order connection information among data, leading to improved clustering performance. We design an efficient algorithm with proved convergence to solve the resultant optimization problem. Extensive experimental results on nine datasets demonstrate the superiority of our algorithm against state-of-the-art methods, which verifies the effectiveness and advantages of the proposed algorithm.

algorithm, laplacian matrix, matrix, (14 more...)

arXiv.org Machine Learning

2008.13539

Country:

North America > Canada > Alberta (0.14)
Asia > China > Hunan Province > Changsha (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Data Science > Data Mining (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

InClass Nets: Independent Classifier Networks for Nonparametric Estimation of Conditional Independence Mixture Models and Unsupervised Classification

Matchev, Konstantin T., Shyamsundar, Prasanth

arXiv.org Machine LearningAug-31-2020

We introduce a new machine-learning-based approach, which we call the Independent Classifier networks (InClass nets) technique, for the nonparameteric estimation of conditional independence mixture models (CIMMs). We approach the estimation of a CIMM as a multi-class classification problem, since dividing the dataset into different categories naturally leads to the estimation of the mixture model. InClass nets consist of multiple independent classifier neural networks (NNs), each of which handles one of the variates of the CIMM. Fitting the CIMM to the data is performed by simultaneously training the individual NNs using suitable cost functions. The ability of NNs to approximate arbitrary functions makes our technique nonparametric. Further leveraging the power of NNs, we allow the conditionally independent variates of the model to be individually high-dimensional, which is the main advantage of our technique over existing non-machine-learning-based approaches. We derive some new results on the nonparametric identifiability of bivariate CIMMs, in the form of a necessary and a (different) sufficient condition for a bivariate CIMM to be identifiable. We provide a public implementation of InClass nets as a Python package called RainDancesVI and validate our InClass nets technique with several worked out examples. Our method also has applications in unsupervised and semi-supervised classification problems.

artificial intelligence, machine learning, mixture model, (16 more...)

arXiv.org Machine Learning

2009.00131

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine (0.92)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

K-Means Clustering Algorithm

#artificialintelligenceAug-30-2020, 19:56:10 GMT

K-Means Clustering Algorithm K-Means Clustering With Python will help you to comprehensively learn all the concepts of the k-means algorithm in machine learning. K-means Clustering is one of the most common data analysis technique used to get an intuition about the structure of the data. It has various applications such as, Identifying Fake news, Filtering spam mails & Customer Segmentation. This "K-means clustering" tutorial will help you to comprehensively learn all the concepts of the k-means algorithm in machine learning. K-means Clustering is one of the most common data analysis technique used to get an intuition about the structure of the data.

artificial intelligence, k-means clustering algorithm, machine learning, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

An Objective for Hierarchical Clustering in Euclidean Space and its Connection to Bisecting K-means

Moseley, Benjamin, Wang, Yuyan

arXiv.org Machine LearningAug-30-2020

This paper explores hierarchical clustering in the case where pairs of points have dissimilarity scores (e.g. distances) as a part of the input. The recently introduced objective for points with dissimilarity scores results in every tree being a 1/2 approximation if the distances form a metric. This shows the objective does not make a significant distinction between a good and poor hierarchical clustering in metric spaces. Motivated by this, the paper develops a new global objective for hierarchical clustering in Euclidean space. The objective captures the criterion that has motivated the use of divisive clustering algorithms: that when a split happens, points in the same cluster should be more similar than points in different clusters. Moreover, this objective gives reasonable results on ground-truth inputs for hierarchical clustering. The paper builds a theoretical connection between this objective and the bisecting k-means algorithm. This paper proves that the optimal 2-means solution results in a constant approximation for the objective. This is the first paper to show the bisecting k-means algorithm optimizes a natural global objective over the entire tree.

artificial intelligence, machine learning, objective, (18 more...)

arXiv.org Machine Learning

2008.13235

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Afghanistan > Parwan Province > Charikar (0.05)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

$K$-way $p$-spectral clustering on Grassmann manifolds

Pasadakis, Dimosthenis, Alappat, Christie Louis, Schenk, Olaf, Wellein, Gerhard

arXiv.org Machine LearningAug-30-2020

Spectral methods have gained a lot of recent attention due to the simplicity of their implementation and their solid mathematical background. We revisit spectral graph clustering, and reformulate in the $p$-norm the continuous problem of minimizing the graph Laplacian Rayleigh quotient. The value of $p \in (1,2]$ is reduced, promoting sparser solution vectors that correspond to optimal clusters as $p$ approaches one. The computation of multiple $p$-eigenvectors of the graph $p$-Laplacian, a nonlinear generalization of the standard graph Laplacian, is achieved by the minimization of our objective function on the Grassmann manifold, hence ensuring the enforcement of the orthogonality constraint between them. Our approach attempts to bridge the fields of graph clustering and nonlinear numerical optimization, and employs a robust algorithm to obtain clusters of high quality. The benefits of the suggested method are demonstrated in a plethora of artificial and real-world graphs. Our results are compared against standard spectral clustering methods and the current state-of-the-art algorithm for clustering using the graph $p$-Laplacian variant.

artificial intelligence, eigenvector, machine learning, (19 more...)

arXiv.org Machine Learning

2008.1321

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Switzerland (0.04)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
(8 more...)

Genre: Research Report (0.84)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Subtask Analysis of Process Data Through a Predictive Model

Wang, Zhi, Tang, Xueying, Liu, Jingchen, Ying, Zhiliang

arXiv.org Artificial IntelligenceAug-29-2020

Response process data collected from human-computer interactive items contain rich information about respondents' behavioral patterns and cognitive processes. Their irregular formats as well as their large sizes make standard statistical tools difficult to apply. This paper develops a computationally efficient method for exploratory analysis of such process data. The new approach segments a lengthy individual process into a sequence of short subprocesses to achieve complexity reduction, easy clustering and meaningful interpretation. Each subprocess is considered a subtask. The segmentation is based on sequential action predictability using a parsimonious predictive model combined with the Shannon entropy. Simulation studies are conducted to assess performance of the new methods. We use the process data from PIAAC 2012 to demonstrate how exploratory analysis of process data can be done with the new approach.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2009.00717

Country: North America > United States > California (0.04)

Genre:

Research Report (0.64)
Workflow (0.49)

Industry: Education (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

The UU-test for Statistical Modeling of Unimodal Data

Chasani, Paraskevi, Likas, Aristidis

arXiv.org Machine LearningAug-28-2020

Deciding on the unimodality of a dataset is an important problem in data analysis and statistical modeling. It allows to obtain knowledge about the structure of the dataset, ie. whether data points have been generated by a probability distribution with a single or more than one peaks. Such knowledge is very useful for several data analysis problems, such as for deciding on the number of clusters and determining unimodal projections. We propose a technique called UU-test (Unimodal Uniform test) to decide on the unimodality of a one-dimensional dataset. The method operates on the empirical cumulative density function (ecdf) of the dataset. It attempts to build a piecewise linear approximation of the ecdf that is unimodal and models the data sufficiently in the sense that the data corresponding to each linear segment follows the uniform distribution. A unique feature of this approach is that in the case of unimodality, it also provides a statistical model of the data in the form of a Uniform Mixture Model. We present experimental results in order to assess the ability of the method to decide on unimodality and perform comparisons with the well-known dip-test approach. In addition, in the case of unimodal datasets we evaluate the Uniform Mixture Models provided by the proposed method using the test set log-likelihood and the two-sample Kolmogorov-Smirnov (KS) test.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Machine Learning

2008.12537

Country: Europe > Greece > Epirus > Ioannina (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback