AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Covariate-guided Bayesian mixture model for multivariate time series

Fu, Haoyi, Tang, Lu, Rosen, Ori, Hipwell, Alison E., Huppert, Theodore J., Krafty, Robert T.

arXiv.org Machine LearningJan-3-2023

With rapid development of techniques to measure brain activity and structure, statistical methods for analyzing modern brain-imaging play an important role in the advancement of science. Imaging data that measure brain function are usually multivariate time series and are heterogeneous across both imaging sources and subjects, which lead to various statistical and computational challenges. In this paper, we propose a group-based method to cluster a collection of multivariate time series via a Bayesian mixture of smoothing splines. Our method assumes each multivariate time series is a mixture of multiple components with different mixing weights. Time-independent covariates are assumed to be associated with the mixture components and are incorporated via logistic weights of a mixture-of-experts model. We formulate this approach under a fully Bayesian framework using Gibbs sampling where the number of components is selected based on a deviance information criterion. The proposed method is compared to existing methods via simulation studies and is applied to a study on functional near-infrared spectroscopy (fNIRS), which aims to understand infant emotional reactivity and recovery from stress. The results reveal distinct patterns of brain activity, as well as associations between these patterns and selected covariates.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

2301.01373

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Texas > El Paso County > El Paso (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Cluster-guided Contrastive Graph Clustering Network

Yang, Xihong, Liu, Yue, Zhou, Sihang, Wang, Siwei, Tu, Wenxuan, Zheng, Qun, Liu, Xinwang, Fang, Liming, Zhu, En

arXiv.org Artificial IntelligenceJan-3-2023

Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.

artificial intelligence, liu, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2301.01098

Country:

Asia > China > Hunan Province > Changsha (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Tweet's popularity dynamics

Willemin, Ferdinand

arXiv.org Artificial IntelligenceJan-2-2023

This article charts the work of a 4 month project aimed at automatically identifying patterns of tweets' popularity evolution using Machine Learning and Deep Learning techniques. To apprehend both the data and the extent of the problem, a straightforward clustering algorithm based on a point to point distance is used. Then, in an attempt to refine the algorithm, various analyses especially using feature extraction techniques are conducted. Although the algorithm eventually fails to automate such a task, this exercise raises a complex but necessary issue touching on the impact of virality on social networks.

data mining, machine learning, tweet, (19 more...)

arXiv.org Artificial Intelligence

2301.00853

Country:

Europe > France (0.04)
Europe > Ukraine > Luhansk Oblast > Sievierodonetsk (0.04)
Europe > Ukraine > Luhansk Oblast > Luhansk (0.04)
Asia > Taiwan (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

ClusTop: An unsupervised and integrated text clustering and topic extraction framework

Chen, Zhongtao, Mi, Chenghu, Duo, Siwei, He, Jingfei, Zhou, Yatong

arXiv.org Artificial IntelligenceJan-2-2023

Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality clustering result and extract topics from each cluster simultaneously. Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction, where the enhanced language model can be viewed as a bridge between clustering and topic extraction. On one hand, it provides text embeddings with a strong cluster structure which facilitates effective text clustering; on the other hand, it pays high attention on the topic related words for topic extraction because of its self-attention architecture. Moreover, the training of enhanced language model is unsupervised. Experiments on two datasets demonstrate the effectiveness of our framework and provide benchmarks for different model combinations in this framework.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2301.00818

Country:

Europe > United Kingdom (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.30)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Theory of Machine Learning with Limited Data

Sapir, Marina

arXiv.org Artificial IntelligenceJan-1-2023

Application of machine learning may be understood as deriving new knowledge for practical use through explaining accumulated observations, training set. Peirce used the term abduction for this kind of inference. Here I formalize the concept of abduction for real valued hypotheses, and show that 14 of the most popular textbook ML learners (every learner I tested), covering classification, regression and clustering, implement this concept of abduction inference. The approach is proposed as an alternative to statistical learning theory, which requires an impractical assumption of indefinitely increasing training set for its justification.

artificial intelligence, learner, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2206.07586

Country:

North America > United States > Indiana (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Correlation Clustering Algorithm for Dynamic Complete Signed Graphs: An Index-based Approach

Shakiba, Ali

arXiv.org Artificial IntelligenceJan-1-2023

Clustering is one of the most studied problems in machine learning with various applications in analyzing and visualizing large datasets. There are various models and technique to obtain a partition of elements, such that elements belonging to different partitions are dissimilar to each other and the elements in the same partition are very similar to each other. The problem of correlation clustering, introduced in [1], is known to be an NP-hard problem for the disagree minimization. Therefore, several different approximation solutions based on its IP formulation exist in the literature. Recently, the idea of a 2-approximation algorithm in [1] is extended in [4] for constructing a O (1)-approximation algorithm. The experiments in [4] show acceptable performance for this algorithm in practice, although its theoretical guarantee can be too high, e.g. 1 442 for β = λ =

data mining, machine learning, nao, (19 more...)

arXiv.org Artificial Intelligence

2301.00384

Country:

Asia > Middle East > Iran (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Mixture of von Mises-Fisher distribution with sparse prototypes

Rossi, Fabrice, Barbaro, Florian

arXiv.org Artificial IntelligenceDec-30-2022

Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a l 1 penalized likelihood. This leads to sparse prototypes that improve clustering interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and explore the trade-off between the sparsity term and the likelihood one with a path following algorithm. The model's behaviour is studied on simulated data and, we show the advantages of the approach on real data benchmark. We also introduce a new data set on financial reports and exhibit the benefits of our method for exploratory analysis.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.neucom.2022.05.118

2212.14591

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Banking & Finance > Trading (0.92)
Banking & Finance > Financial Services (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Traceable Automatic Feature Transformation via Cascading Actor-Critic Agents

Xiao, Meng, Wang, Dongjie, Wu, Min, Qiao, Ziyue, Wang, Pengfei, Liu, Kunpeng, Zhou, Yuanchun, Fu, Yanjie

arXiv.org Artificial IntelligenceDec-30-2022

Feature transformation for AI is an essential task to boost the effectiveness and interpretability of machine learning (ML). Feature transformation aims to transform original data to identify an optimal feature space that enhances the performances of a downstream ML model. Existing studies either combines preprocessing, feature selection, and generation skills to empirically transform data, or automate feature transformation by machine intelligence, such as reinforcement learning. However, existing studies suffer from: 1) high-dimensional non-discriminative feature space; 2) inability to represent complex situational states; 3) inefficiency in integrating local and global feature information. To fill the research gap, we formulate the feature transformation task as an iterative, nested process of feature generation and selection, where feature generation is to generate and add new features based on original features, and feature selection is to remove redundant features to control the size of feature space. Finally, we present extensive experiments and case studies to illustrate 24.7\% improvements in F1 scores compared with SOTAs and robustness in high-dimensional data.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2212.13402

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Comparative Analysis of Clustering Techniques for Personalized Food Kit Distribution

Francis, Jude, Baby, Rowan K, Abraham, Jacob, S, Ajmal P.

arXiv.org Artificial IntelligenceDec-30-2022

The Government of Kerala had increased the frequency of supply of free food kits owing to the pandemic, however, these items were static and not indicative of the personal preferences of the consumers. This paper conducts a comparative analysis of various clustering techniques on a scaled-down version of a real-world dataset obtained through a conjoint analysis-based survey. Clustering carried out by centroid-based methods such as k means is analyzed and the results are plotted along with SVD, and finally, a conclusion is reached as to which among the two is better. Once the clusters have been formulated, commodities are also decided upon for each cluster. Also, clustering is further enhanced by reassignment, based on a specific cluster loss threshold. Thus, the most efficacious clustering technique for designing a food kit tailored to the needs of individuals is finally obtained.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2212.14874

Country:

Asia > India > Rajasthan (0.04)
Asia > India > Kerala > Thiruvananthapuram (0.04)

Genre: Research Report (0.50)

Industry:

Government (0.88)
Health & Medicine (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Machine Learning with Python

#artificialintelligenceDec-29-2022, 06:00:25 GMT

Get ready to dive into the world of Machine Learning (ML) by using Python! This course is for you whether you want to advance your Data Science career or get started in Machine Learning and Deep Learning. This course will begin with a gentle introduction to Machine Learning and what it is, with topics like supervised vs unsupervised learning, linear & non-linear regression, simple regression and more. You will then dive into classification techniques using different classification algorithms, namely K-Nearest Neighbors (KNN), decision trees, and Logistic Regression. You'll also learn about the importance and different types of clustering such as k-means, hierarchical clustering, and DBSCAN.

algorithm, machine learning, python, (1 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.86)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.64)

Add feedback