AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Latent Dirichlet Allocation

#artificialintelligenceAug-1-2022, 16:45:45 GMT

Latent Dirichlet Allocation, or LDA for short, is an unsupervised machine learning algorithm. Similar to the clustering algorithm K-means, LDA will attempt to group words and documents into a predefined number of clusters (i.e. These topics can then be used to organize and search through documents. The most popular methods for estimating the LDA model is Gibbs sampling. Let's walk through one iteration of the algorithm.

latent dirichlet allocation, probability, topic 0, (11 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.71)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.55)

Add feedback

Late Fusion Multi-view Clustering via Global and Local Alignment Maximization

Wang, Siwei, Liu, Xinwang, Zhu, En

arXiv.org Artificial IntelligenceAug-1-2022

Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance. Although demonstrating promising performance in various applications, most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering, which could cause over-complicated optimization and intensive computational cost. In this paper, we propose late fusion MVC via alignment maximization to address these issues. To do so, we first reveal the theoretical connection of existing k-means clustering and the alignment between base partitions and the consensus one. Based on this observation, we propose a simple but effective multi-view algorithm termed LF-MVC-GAM. It optimally fuses multiple source information in partition level from each individual view, and maximally aligns the consensus partition with these weighted base ones. Such an alignment is beneficial to integrate partition level information and significantly reduce the computational complexity by sufficiently simplifying the optimization procedure. We then design another variant, LF-MVC-LAM to further improve the clustering performance by preserving the local intrinsic structure among multiple partition spaces. After that, we develop two three-step iterative algorithms to solve the resultant optimization problems with theoretically guaranteed convergence. Further, we provide the generalization error bound analysis of the proposed algorithms. Extensive experiments on eighteen multi-view benchmark datasets demonstrate the effectiveness and efficiency of the proposed LF-MVC-GAM and LF-MVC-LAM, ranging from small to large-scale data items. The codes of the proposed algorithms are publicly available at https://github.com/wangsiwei2010/latefusionalignment.

algorithm, dataset, kernel, (14 more...)

arXiv.org Artificial Intelligence

2208.01198

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Interpretable Time Series Clustering Using Local Explanations

Ozyegen, Ozan, Prayogo, Nicholas, Cevik, Mucahit, Basar, Ayse

arXiv.org Artificial IntelligenceAug-1-2022

This study focuses on exploring the use of local interpretability methods for explaining time series clustering models. Many of the state-of-the-art clustering models are not directly explainable. To provide explanations for these clustering algorithms, we train classification models to estimate the cluster labels. Then, we use interpretability methods to explain the decisions of the classification models. The explanations are used to obtain insights into the clustering models. We perform a detailed numerical study to test the proposed approach on multiple datasets, clustering models, and classification models. The analysis of the results shows that the proposed approach can be used to explain time series clustering models, specifically when the underlying classification model is accurate. Lastly, we provide a detailed analysis of the results, discussing how our approach can be used in a real-life scenario.

dataset, explanation, interpretability method, (11 more...)

arXiv.org Artificial Intelligence

2208.01152

Country:

North America > Canada > Ontario > Toronto (0.05)
Asia (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Energy > Power Industry (0.68)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Evo* 2022 -- Late-Breaking Abstracts Volume

Mora, A. M., Esparcia-Alcázar, A. I.

arXiv.org Artificial IntelligenceJul-31-2022

This volume contains the Late-Breaking Abstracts accepted at Evo* 2022 Conference, held in Madrid (Spain), from 20 to 22 of April. They were also presented as short talks as well as at the conference's poster session. The works present ongoing research and preliminary results investigating on the application of different approaches of Evolutionary Computation and other Nature-Inspired techniques to different problems, most of them real world ones. These are very promising contributions, since they outline some of the incoming advances and applications in the area of nature-inspired methods, mainly Evolutionary Algorithms.

algorithm, arxiv, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2208.00555

Country:

Europe > Spain > Galicia > Madrid (0.24)
Europe > Portugal > Lisbon > Lisbon (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(11 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Media > Music (1.00)
Banking & Finance (0.92)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
(2 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
(4 more...)

Add feedback

A Small Survey On Event Detection Using Twitter

Datta, Debanjan

arXiv.org Artificial IntelligenceJul-30-2022

This is evident from popular phenomena such as effects of fake news and online social movements. However the the data obtained from social media presents itself with large volume and velocity, accompanied by significant amount of irrelevant data pertaining to general discussions, personal messages and spam. Social media has been shown to be effective for detecting, forecasting and tracking real world events. The ability to detect real world events is crucial and has applications in disease surveillance, commerce, governance and other areas. Thus extraction of useful information and modelling the characteristics of social media to detect real world events is an important problem. 2 RESEARCH PROBLEM To outline the research problem we need to define events, which has multiple interpretations.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2011.05801

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
Asia (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(2 more...)

Add feedback

Bayesian nonparametric mixture inconsistency for the number of components: How worried should we be in practice?

Chaumeny, Yannis, Moris, Johan van der Molen, Davison, Anthony C., Kirk, Paul D. W.

arXiv.org Machine LearningJul-29-2022

We consider the Bayesian mixture of finite mixtures (MFMs) and Dirichlet process mixture (DPM) models for clustering. Recent asymptotic theory has established that DPMs overestimate the number of clusters for large samples and that estimators from both classes of models are inconsistent for the number of clusters under misspecification, but the implications for finite sample analyses are unclear. The final reported estimate after fitting these models is often a single representative clustering obtained using an MCMC summarisation technique, but it is unknown how well such a summary estimates the number of clusters. Here we investigate these practical considerations through simulations and an application to gene expression data, and find that (i) DPMs overestimate the number of clusters even in finite samples, but only to a limited degree that may be correctable using appropriate summaries, and (ii) misspecification can lead to considerable overestimation of the number of clusters in both DPMs and MFMs, but results are nevertheless often still interpretable. We provide recommendations on MCMC summarisation and suggest that although the more appealing asymptotic properties of MFMs provide strong motivation to prefer them, results obtained using MFMs and DPMs are often very similar in practice.

artificial intelligence, machine learning, mixture model, (15 more...)

arXiv.org Machine Learning

2207.14717

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > Pennsylvania (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Stochastic Parallelizable Eigengap Dilation for Large Graph Clustering

van der Pol, Elise, Gemp, Ian, Bachrach, Yoram, Everett, Richard

arXiv.org Artificial IntelligenceJul-29-2022

Large graphs commonly appear in social networks, knowledge graphs, recommender systems, life sciences, and decision making problems. Summarizing large graphs by their high level properties is helpful in solving problems in these settings. In spectral clustering, we aim to identify clusters of nodes where most edges fall within clusters and only few edges fall between clusters. This task is important for many downstream applications and exploratory analysis. A core step of spectral clustering is performing an eigendecomposition of the corresponding graph Laplacian matrix (or equivalently, a singular value decomposition, SVD, of the incidence matrix). The convergence of iterative singular value decomposition approaches depends on the eigengaps of the spectrum of the given matrix, i.e., the difference between consecutive eigenvalues. For a graph Laplacian corresponding to a well-clustered graph, the eigenvalues will be non-negative but very small (much less than $1$) slowing convergence. This paper introduces a parallelizable approach to dilating the spectrum in order to accelerate SVD solvers and in turn, spectral clustering. This is accomplished via polynomial approximations to matrix operations that favorably transform the spectrum of a matrix without changing its eigenvectors. Experiments demonstrate that this approach significantly accelerates convergence, and we explain how this transformation can be parallelized and stochastically approximated to scale with available compute.

approximation, eigenvector, graph laplacian, (11 more...)

arXiv.org Artificial Intelligence

2207.14589

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.51)

Add feedback

Introduction to K-means Clustering

#artificialintelligenceJul-28-2022, 06:06:49 GMT

This article will answer these questions. Apart from all this, we will also learn more about K-means clustering and its implementation by defining K-means fit function. Clustering is an unsupervised learning technique. It is used to group different data points based on similar features or characteristics. For example, A company wants to know to whom they should display a particular ad such the chances of clicking it increases.

algorithm, fit function, k-means clustering, (1 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.79)

Add feedback

Expanding the class of global objective functions for dissimilarity-based hierarchical clustering

Roch, Sebastien

arXiv.org Artificial IntelligenceJul-28-2022

Background In hierarchical clustering, one seeks a recursive partitioning of the data that captures clustering information at different levels of granularity. Classical work on the subject mostly takes an algorithmic perspective. In particular, various iterative clustering methods have been developed, including the well-known bottom-up dissimilarity-based approaches single linkage, average linkage, etc. (see, e.g., [Mur12, Chapter 25]). Recent work on dissimilarity-based hierarchical clustering has emphasized a different, optimization-based, perspective. This has led to the introduction of global objective functions for this classical problem [Das16].

artificial intelligence, hierarchy, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2207.14375

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Online Inference for Mixture Model of Streaming Graph Signals with Non-White Excitation

He, Yiran, Wai, Hoi-To

arXiv.org Machine LearningJul-28-2022

This paper considers a joint multi-graph inference and clustering problem for simultaneous inference of node centrality and association of graph signals with their graphs. We study a mixture model of filtered low pass graph signals with possibly non-white and low-rank excitation. While the mixture model is motivated from practical scenarios, it presents significant challenges to prior graph learning methods. As a remedy, we consider an inference problem focusing on the node centrality of graphs. We design an expectation-maximization (EM) algorithm with a unique low-rank plus sparse prior derived from low pass signal property. We propose a novel online EM algorithm for inference from streaming data. As an example, we extend the online algorithm to detect if the signals are generated from an abnormal graph. We show that the proposed algorithms converge to a stationary point of the maximum-a-posterior (MAP) problem. Numerical experiments support our analysis.

artificial intelligence, graph, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2023.3238272

2207.14019

Country:

Asia > Singapore (0.04)
Asia > Middle East > Iran (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report (0.82)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback