AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Time Series Clustering with an EM algorithm for Mixtures of Linear Gaussian State Space Models

Umatani, Ryohei, Imai, Takashi, Kawamoto, Kaoru, Kunimasa, Shutaro

arXiv.org Artificial IntelligenceFeb-21-2023

In this paper, we consider the task of clustering a set of individual time series while modeling each cluster, that is, model-based time series clustering. The task requires a parametric model with sufficient flexibility to describe the dynamics in various time series. To address this problem, we propose a novel model-based time series clustering method with mixtures of linear Gaussian state space models, which have high flexibility. The proposed method uses a new expectation-maximization algorithm for the mixture model to estimate the model parameters, and determines the number of clusters using the Bayesian information criterion. Experiments on a simulated dataset demonstrate the effectiveness of the method in clustering, parameter estimation, and model selection. The method is applied to real datasets commonly used to evaluate time series clustering methods. Results showed that the proposed method produces clustering results that are as accurate or more accurate than those obtained using previous methods.

artificial intelligence, machine learning, time sery, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.patcog.2023.109375

2208.11907

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.04)
North America > United States > New Jersey (0.04)
(6 more...)

Genre: Research Report > New Finding (0.54)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Boosting Nystr\"{o}m Method

Hamm, Keaton, Lu, Zhaoying, Ouyang, Wenbo, Zhang, Hao Helen

arXiv.org Artificial IntelligenceFeb-21-2023

The Nystr\"{o}m method is an effective tool to generate low-rank approximations of large matrices, and it is particularly useful for kernel-based learning. To improve the standard Nystr\"{o}m approximation, ensemble Nystr\"{o}m algorithms compute a mixture of Nystr\"{o}m approximations which are generated independently based on column resampling. We propose a new family of algorithms, boosting Nystr\"{o}m, which iteratively generate multiple ``weak'' Nystr\"{o}m approximations (each using a small number of columns) in a sequence adaptively - each approximation aims to compensate for the weaknesses of its predecessor - and then combine them to form one strong approximation. We demonstrate that our boosting Nystr\"{o}m algorithms can yield more efficient and accurate low-rank approximations to kernel matrices. Improvements over the standard and ensemble Nystr\"{o}m methods are illustrated by simulation studies and real-world data analysis.

approximation, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2302.11032

Country:

North America > United States > Arizona > Pima County > Tucson (0.14)
North America > United States > Texas > Tarrant County > Arlington (0.04)
North America > United States > Colorado (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Robust Fair Clustering: A Novel Fairness Attack and Defense Framework

Chhabra, Anshuman, Li, Peizhao, Mohapatra, Prasant, Liu, Hongfu

arXiv.org Artificial IntelligenceFeb-20-2023

Clustering algorithms are widely used in many societal resource allocation applications, such as loan approvals and candidate recruitment, among others, and hence, biased or unfair model outputs can adversely impact individuals that rely on these applications. To this end, many fair clustering approaches have been recently proposed to counteract this issue. Due to the potential for significant harm, it is essential to ensure that fair clustering algorithms provide consistently fair outputs even under adversarial influence. However, fair clustering algorithms have not been studied from an adversarial attack perspective. In contrast to previous research, we seek to bridge this gap and conduct a robustness analysis against fair clustering by proposing a novel black-box fairness attack. Through comprehensive experiments, we find that state-of-the-art models are highly susceptible to our attack as it can reduce their fairness performance significantly. Finally, we propose Consensus Fair Clustering (CFC), the first robust fair clustering approach that transforms consensus clustering into a fair graph partitioning problem, and iteratively learns to generate fair cluster outputs. Experimentally, we observe that CFC is highly robust to the proposed attack and is thus a truly robust fair clustering alternative.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2210.01953

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > California > Yolo County > Davis (0.04)

Genre: Research Report > Experimental Study (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

AutoAC: Towards Automated Attribute Completion for Heterogeneous Graph Neural Network

Zhu, Guanghui, Zhu, Zhennan, Wang, Wenjie, Xu, Zhuoer, Yuan, Chunfeng, Huang, Yihua

arXiv.org Artificial IntelligenceFeb-20-2023

Many real-world data can be modeled as heterogeneous graphs that contain multiple types of nodes and edges. Meanwhile, due to excellent performance, heterogeneous graph neural networks (GNNs) have received more and more attention. However, the existing work mainly focuses on the design of novel GNN models, while ignoring another important issue that also has a large impact on the model performance, namely the missing attributes of some node types. The handcrafted attribute completion requires huge expert experience and domain knowledge. Also, considering the differences in semantic characteristics between nodes, the attribute completion should be fine-grained, i.e., the attribute completion operation should be node-specific. Moreover, to improve the performance of the downstream graph learning task, attribute completion and the training of the heterogeneous GNN should be jointly optimized rather than viewed as two separate processes. To address the above challenges, we propose a differentiable attribute completion framework called AutoAC for automated completion operation search in heterogeneous GNNs. We first propose an expressive completion operation search space, including topology-dependent and topology-independent completion operations. Then, we propose a continuous relaxation schema and further propose a differentiable completion algorithm where the completion operation search is formulated as a bi-level joint optimization problem. To improve the search efficiency, we leverage two optimization techniques: discrete constraints and auxiliary unsupervised graph node clustering. Extensive experimental results on real-world datasets reveal that AutoAC outperforms the SOTA handcrafted heterogeneous GNNs and the existing attribute completion method

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2301.03049

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry:

Media (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

HiURE: Hierarchical Exemplar Contrastive Learning for Unsupervised Relation Extraction

Hu, Xuming, Liu, Shuliang, Zhang, Chenwei, Li, Shu`ang, Wen, Lijie, Yu, Philip S.

arXiv.org Artificial IntelligenceFeb-20-2023

Unsupervised relation extraction aims to extract the relationship between entities from natural language sentences without prior information on relational scope or distribution. Existing works either utilize self-supervised schemes to refine relational feature signals by iteratively leveraging adaptive clustering and classification that provoke gradual drift problems, or adopt instance-wise contrastive learning which unreasonably pushes apart those sentence pairs that are semantically similar. To overcome these defects, we propose a novel contrastive learning framework named HiURE, which has the capability to derive hierarchical signals from relational feature space using cross hierarchy attention and effectively optimize relation representation of sentences under exemplar-wise contrastive learning. Experimental results on two public datasets demonstrate the advanced effectiveness and robustness of HiURE on unsupervised relation extraction when compared with state-of-the-art models.

exemplar, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2205.02225

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Indonesia > Bali (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Unsupervised Diffusion and Volume Maximization-Based Clustering of Hyperspectral Images

Polk, Sam L., Cui, Kangning, Chan, Aland H. Y., Coomes, David A., Plemmons, Robert J., Murphy, James M.

arXiv.org Artificial IntelligenceFeb-19-2023

Hyperspectral images taken from aircraft or satellites contain information from hundreds of spectral bands, within which lie latent lower-dimensional structures that can be exploited for classifying vegetation and other materials. A disadvantage of working with hyperspectral images is that, due to an inherent trade-off between spectral and spatial resolution, they have a relatively coarse spatial scale, meaning that single pixels may correspond to spatial regions containing multiple materials. This article introduces the Diffusion and Volume maximization-based Image Clustering (D-VIC) algorithm for unsupervised material clustering to address this problem. By directly incorporating pixel purity into its labeling procedure, D-VIC gives greater weight to pixels that correspond to a spatial region containing just a single material. D-VIC is shown to outperform comparable state-of-the-art methods in extensive experiments on a range of hyperspectral images, including land-use maps and highly mixed forest health surveys (in the context of ash dieback disease), implying that it is well-equipped for unsupervised material clustering of spectrally-mixed hyperspectral datasets.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/rs15041053

2203.09992

Country:

North America (0.68)
Europe > United Kingdom > England > Cambridgeshire (0.28)

Genre: Research Report (0.84)

Industry: Energy > Oil & Gas (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Probabilistic Back-ends for Online Speaker Recognition and Clustering

Sholokhov, Alexey, Kuzmin, Nikita, Lee, Kong Aik, Chng, Eng Siong

arXiv.org Artificial IntelligenceFeb-19-2023

This paper focuses on multi-enrollment speaker recognition which naturally occurs in the task of online speaker clustering, and studies the properties of different scoring back-ends in this scenario. First, we show that popular cosine scoring suffers from poor score calibration with a varying number of enrollment utterances. Second, we propose a simple replacement for cosine scoring based on an extremely constrained version of probabilistic linear discriminant analysis (PLDA). The proposed model improves over the cosine scoring for multi-enrollment recognition while keeping the same performance in the case of one-to-one comparisons. Finally, we consider an online speaker clustering task where each step naturally involves multi-enrollment recognition. We propose an online clustering algorithm allowing us to take benefits from the PLDA model such as the ability to handle uncertainty and better score calibration. Our experiments demonstrate the effectiveness of the proposed algorithm.

artificial intelligence, machine learning, pattern recognition, (18 more...)

arXiv.org Artificial Intelligence

2302.09523

Country:

Asia > Singapore (0.04)
North America > United States > New York (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Speech Recognition (0.63)

Add feedback

Understanding Unsupervised Machine Learning

#artificialintelligenceFeb-18-2023, 05:25:18 GMT

In supervised machine learning, we have a labeled dataset that is used to train the model. For example, we train a model to predict the prices of houses based on features like area, number of bedrooms, and location, etc. In unsupervised machine learning, we do not have a labeled dataset. The goal of unsupervised machine learning is to find patterns and relationships in data. Clustering is one of the most popular techniques used in unsupervised machine learning.

algorithm, dataset, k-means, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Practical Example of Clustering and Radial Basis Functions (RBF)

#artificialintelligenceFeb-17-2023, 05:05:17 GMT

Clustering is a technique used in machine learning and data analysis to group similar data points together. The goal of clustering is to identify patterns and relationships in the data without any prior knowledge of the underlying structure. Clustering is commonly used in unsupervised learning, where the algorithm is not given any labeled data and must find its own structure in the data. There are numerous applications of clustering in various fields such as finance, marketing, biology, social networks, image and video processing, and many more. There are several different algorithms that can be used for clustering, including k-means, hierarchical clustering, and DBSCAN.

algorithm, rbf function, rbf kernel, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.54)

Add feedback

Data-driven framework for input/output lookup tables reduction: Application to hypersonic flows in chemical non-equilibrium

Scherding, Clément, Rigas, Georgios, Sipp, Denis, Schmid, Peter J., Sayadi, Taraneh

arXiv.org Artificial IntelligenceFeb-17-2023

In this paper, we present a novel model-agnostic machine learning technique to extract a reduced thermochemical model for reacting hypersonic flows simulation. A first simulation gathers all relevant thermodynamic states and the corresponding gas properties via a given model. The states are embedded in a low-dimensional space and clustered to identify regions with different levels of thermochemical (non)-equilibrium. Then, a surrogate surface from the reduced cluster-space to the output space is generated using radial-basis-function networks. The method is validated and benchmarked on a simulation of a hypersonic flat-plate boundary layer with finite-rate chemistry. The gas properties of the reactive air mixture are initially modeled using the open-source Mutation++ library. Substituting Mutation++ with the light-weight, machine-learned alternative improves the performance of the solver by 50% while maintaining overall accuracy.

artificial intelligence, machine learning, simulation, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1103/PhysRevFluids.8.023201

2210.04269

Country:

North America > United States (0.68)
Europe (0.67)

Genre: Research Report > Promising Solution (0.34)

Industry: Energy > Oil & Gas > Upstream (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback