AITopics | clusterer

Collaborating Authors

clusterer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering

Holder, Christopher, Bagnall, Anthony

arXiv.org Artificial IntelligenceNov-26-2024

Time series data has become increasingly prevalent across numerous domains, driving a growing demand for time series machine learning techniques. Among these, time series clustering (TSCL) stands out as one of the most popular machine learning tasks. TSCL serves as a powerful exploratory analysis tool and is also employed as a preprocessing step or subroutine for various tasks, including anomaly detection, segmentation, and classification. The most popular TSCL algorithms are either fast (in terms of run time) but perform poorly on benchmark problems, or perform well on benchmarks but scale poorly. We present a new TSCL algorithm, the $k$-means (K) accelerated (A) Stochastic subgradient (S) Barycentre (B) Average (A) (KASBA) clustering algorithm. KASBA is a $k$-means clustering algorithm that uses the Move-Split-Merge (MSM) elastic distance at all stages of clustering, applies a randomised stochastic subgradient gradient descent to find barycentre centroids, links each stage of clustering to accelerate convergence and exploits the metric property of MSM distance to avoid a large proportion of distance calculations. It is a versatile and scalable clusterer designed for real-world TSCL applications. It allows practitioners to balance run time and clustering performance. We demonstrate through extensive experimentation that KASBA produces significantly better clustering than the faster state of the art clusterers and is offers orders of magnitude improvement in run time over the most performant $k$-means alternatives.

algorithm, centroid, time sery, (16 more...)

arXiv.org Artificial Intelligence

2411.17838

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)
North America > United States > California > Riverside County > Riverside (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.95)
Research Report > New Finding (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders

Lowe, Scott C., Haurum, Joakim Bruslund, Oore, Sageev, Moeslund, Thomas B., Taylor, Graham W.

arXiv.org Artificial IntelligenceJun-4-2024

Self-supervised learning (SSL) has attracted great interest in recent years across almost every machine learning sub-field, due to the promise of being able to harness large quantities of unlabelled data and obtaining generic feature embeddings useful for a variety of downstream tasks (Balestriero et al., 2023). This has, for example, led to the development of impressive large language models (Brown et al., 2020) and computer vision systems trained on 1 billion images (Goyal et al., 2021). However, while the embeddings from an SSL-trained encoder can perform well on downstream tasks after fine-tuning the network, there has been less investigation into the utility of the embeddings without fine-tuning. Prior work (Vaze et al., 2022; Zhou and Zhang, 2022) suggests SSL feature encoders generate embeddings suitable for clustering, but nonetheless adjust the feature encoders through fine-tuning. Yet, widespread interest in the application of large pretrained models on custom datasets, combined with prohibitive cost of compute, make this question important and increasingly urgent. We find that to date there has been no investigation into whether SSL-trained feature encoders can serve as a foundation for clustering, yielding informative groupings of embeddings on real-world datasets that were totally unseen to the encoder during its training. Vaze et al. (2023) showed that features from SSL encoders are typically biased toward shape features and not color, texture, or count when clustered using K-Means. However, this was conducted using a synthetic dataset, where very specific object attributes could be disentangled.

clusterer, dataset, encoder, (14 more...)

arXiv.org Artificial Intelligence

2406.02465

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Kansas (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain

Monsefi, Amin Karimi, Karisani, Payam, Zhou, Mengxi, Choi, Stacey, Doble, Nathan, Ji, Heng, Parthasarathy, Srinivasan, Ramnath, Rajiv

arXiv.org Artificial IntelligenceFeb-9-2024

Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally, upon deployment, these methods are usually used to process a large volume of data on a daily basis, imposing a high maintenance cost on medical facilities. In this paper, we introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method to mitigate such challenges. LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy to capture both long-range and short-range feature dependencies adeptly. This is in contrast to existing methods that rely on increasing network capacity to enhance feature extraction. This combination of novel techniques in our model is especially beneficial in medical image segmentation, given the difficulty of learning intricate and often irregular body organ shapes, such as the spleen. Complementary, we propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets. The method combines masking and contrastive learning techniques within a multi-task learning framework and is compatible with both Vision Transformer (ViT) and CNN-based models. We demonstrate the efficacy of our methods in numerous tasks across two standard datasets (i.e., BTCV and MSD). Benchmark comparisons with eight state-of-the-art models highlight LoGoNet's superior performance in both inference time and accuracy.

dataset, logonet, segmentation, (16 more...)

arXiv.org Artificial Intelligence

2402.0619

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > District of Columbia > Washington (0.05)
North America > United States > Ohio > Franklin County > Columbus (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.92)
Research Report > Promising Solution (0.86)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering

Wang, Quan, Huang, Yiling, Lu, Han, Zhao, Guanlong, Moreno, Ignacio Lopez

arXiv.org Artificial IntelligenceJan-8-2024

While recent research advances in speaker diarization mostly focus on improving the quality of diarization results, there is also an increasing interest in improving the efficiency of diarization systems. In this paper, we demonstrate that a multi-stage clustering strategy that uses different clustering algorithms for input of different lengths can address multi-faceted challenges of on-device speaker diarization applications. Specifically, a fallback clusterer is used to handle short-form inputs; a main clusterer is used to handle medium-length inputs; and a pre-clusterer is used to compress long-form inputs before they are processed by the main clusterer. Both the main clusterer and the pre-clusterer can be configured with an upper bound of the computational complexity to adapt to devices with different resource constraints. This multi-stage clustering strategy is critical for streaming on-device speaker diarization systems, where the budgets of CPU, memory and battery are tight.

clusterer, efficient real-time streaming, spectral, (12 more...)

arXiv.org Artificial Intelligence

2210.1369

Country:

North America > United States (0.04)
South America > Brazil (0.04)
Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
Asia > India (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Model-based clustering using non-parametric Hidden Markov Models

Gassiat, Elisabeth, Kaddouri, Ibrahim, Naulet, Zacharie

arXiv.org Machine LearningSep-25-2023

Thanks to their dependency structure, non-parametric Hidden Markov Models (HMMs) are able to handle model-based clustering without specifying group distributions. The aim of this work is to study the Bayes risk of clustering when using HMMs and to propose associated clustering procedures. We first give a result linking the Bayes risk of classification and the Bayes risk of clustering, which we use to identify the key quantity determining the difficulty of the clustering task. We also give a proof of this result in the i.i.d. framework, which might be of independent interest. Then we study the excess risk of the plugin classifier. All these results are shown to remain valid in the online setting where observations are clustered sequentially. Simulations illustrate our findings.

artificial intelligence, bayes risk, machine learning, (17 more...)

arXiv.org Machine Learning

2309.12238

Country:

North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

A Review and Evaluation of Elastic Distance Functions for Time Series Clustering

Holder, Chris, Middlehurst, Matthew, Bagnall, Anthony

arXiv.org Artificial IntelligenceApr-26-2023

Time series clustering is the act of grouping time series data without recourse to a label. Algorithms that cluster time series can be classified into two groups: those that employ a time series specific distance measure; and those that derive features from time series. Both approaches usually rely on traditional clustering algorithms such as $k$-means. Our focus is on distance based time series that employ elastic distance measures, i.e. distances that perform some kind of realignment whilst measuring distance. We describe nine commonly used elastic distance measures and compare their performance with k-means and k-medoids clustering. Our findings are surprising. The most popular technique, dynamic time warping (DTW), performs worse than Euclidean distance with k-means, and even when tuned, is no better. Using k-medoids rather than k-means improved the clusterings for all nine distance measures. DTW is not significantly better than Euclidean distance with k-medoids. Generally, distance measures that employ editing in conjunction with warping perform better, and one distance measure, the move-split-merge (MSM) method, is the best performing measure of this study. We also compare to clustering with DTW using barycentre averaging (DBA). We find that DBA does improve DTW k-means, but that the standard DBA is still worse than using MSM. Our conclusion is to recommend MSM with k-medoids as the benchmark algorithm for clustering time series with elastic distance measures. We provide implementations in the aeon toolkit, results and guidance on reproducing results on the associated GitHub repository.

artificial intelligence, distance measure, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2205.15181

Country:

North America > United States > California > Riverside County > Riverside (0.04)
Europe > United Kingdom > England > Norfolk > Norwich (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Earth Engine Tutorial #31: Machine Learning with Earth Engine - Unsupervised Classification

#artificialintelligenceAug-7-2020, 15:26:07 GMT

This tutorial shows you how to perform unsupervised classification (e.g., KMeans clustering) in Earth Engine. The ee.Clusterer package handles unsupervised classification (or clustering) in Earth Engine. These algorithms are currently based on the algorithms with the same name in Weka. More details about each Clusterer are available in the reference docs in the Code Editor. Clusterers are used in the same manner as classifiers in Earth Engine.

artificial intelligence, earth engine, machine learning, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimal Clustering under Uncertainty

Dalton, Lori A., Benalcázar, Marco E., Dougherty, Edward R.

arXiv.org Machine LearningJun-2-2018

Classical clustering algorithms typically either lack an underlying probability framework to make them predictive or focus on parameter estimation rather than defining and minimizing a notion of error. Recent work addresses these issues by developing a probabilistic framework based on the theory of random labeled point processes and characterizing a Bayes clusterer that minimizes the number of misclustered points. The Bayes clusterer is analogous to the Bayes classifier. Whereas determining a Bayes classifier requires full knowledge of the feature-label distribution, deriving a Bayes clusterer requires full knowledge of the point process. When uncertain of the point process, one would like to find a robust clusterer that is optimal over the uncertainty, just as one may find optimal robust classifiers with uncertain feature-label distributions. Herein, we derive an optimal robust clusterer by first finding an effective random point process that incorporates all randomness within its own probabilistic structure and from which a Bayes clusterer can be derived that provides an optimal robust clusterer relative to the uncertainty. This is analogous to the use of effective class-conditional distributions in robust classification. After evaluating the performance of robust clusterers in synthetic mixtures of Gaussians models, we apply the framework to granular imaging, where we make use of the asymptotic granulometric moment theory for granular images to relate robust clustering theory to the application.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1806.00672

Country:

Europe (0.67)
North America > United States > Texas (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback