AITopics | k-means model

Collaborating Authors

k-means model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data-Driven Discovery of Feature Groups in Clinical Time Series

Sergeev, Fedor, Burger, Manuel, Leshetkina, Polina, Fortuin, Vincent, Rätsch, Gunnar, Kuznetsova, Rita

arXiv.org Artificial IntelligenceNov-12-2025

Clinical time series data are critical for patient monitoring and predictive modeling. These time series are typically multivariate and often comprise hundreds of heterogeneous features from different data sources. The grouping of features based on similarity and relevance to the prediction task has been shown to enhance the performance of deep learning architectures. However, defining these groups a priori using only semantic knowledge is challenging, even for domain experts. To address this, we propose a novel method that learns feature groups by clustering weights of feature-wise embedding layers. This approach seamlessly integrates into standard supervised training and discovers the groups that directly improve downstream performance on clinically relevant tasks. We demonstrate that our method outperforms static clustering approaches on synthetic data and achieves performance comparable to expert-defined groups on real-world medical data. Moreover, the learned feature groups are clinically interpretable, enabling data-driven discovery of task-relevant relationships between variables.

artificial intelligence, feature group, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.0826

Country: Europe > Switzerland (0.46)

Genre: Research Report > Promising Solution (0.66)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization

Tang, Beilong, Miao, Xiaoxiao, Wang, Xin, Li, Ming

arXiv.org Artificial IntelligenceAug-19-2025

Voice anonymization protects speaker privacy by concealing identity while preserving linguistic and paralinguistic content. Self-supervised learning (SSL) representations encode linguistic features but preserve speaker traits. We propose a novel speaker-embedding-free framework called SEF-MK. Instead of using a single k-means model trained on the entire dataset, SEF-MK anonymizes SSL representations for each utterance by randomly selecting one of multiple k-means models, each trained on a different subset of speakers. We explore this approach from both attacker and user perspectives. Extensive experiments show that, compared to a single k-means model, SEF-MK with multiple k-means models better preserves linguistic and emotional content from the user's viewpoint. However, from the attacker's perspective, utilizing multiple k-means models boosts the effectiveness of privacy attacks. These insights can aid users in designing voice anonymization systems to mitigate attacker threats.

artificial intelligence, k-means model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.07086

Country: Asia (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models

Xu, Jing, Wu, Minglin, Wu, Xixin, Meng, Helen

arXiv.org Artificial IntelligenceJun-20-2024

Self-supervised (SSL) models have shown great performance in various downstream tasks. However, they are typically developed for limited languages, and may encounter new languages in real-world. Developing a SSL model for each new language is costly. Thus, it is vital to figure out how to efficiently adapt existed SSL models to a new language without impairing its original abilities. We propose adaptation methods which integrate LoRA to existed SSL models to extend new language. We also develop preservation strategies which include data combination and re-clustering to retain abilities on existed languages. Applied to mHuBERT, we investigate their effectiveness on speech re-synthesis task. Experiments show that our adaptation methods enable mHuBERT to be applied to a new language (Mandarin) with MOS value increased about 1.6 and the relative value of WER reduced up to 61.72%. Also, our preservation strategies ensure that the performance on both existed and new languages remains intact.

adaptation, adaptation strategy, ssl model, (13 more...)

arXiv.org Artificial Intelligence

2406.14092

Country: Asia > China > Hong Kong (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

Pac-HuBERT: Self-Supervised Music Source Separation via Primitive Auditory Clustering and Hidden-Unit BERT

Chen, Ke, Wichern, Gordon, Germain, François G., Roux, Jonathan Le

arXiv.org Artificial IntelligenceApr-4-2023

In spite of the progress in music source separation research, the small amount of publicly-available clean source data remains a constant limiting factor for performance. Thus, recent advances in self-supervised learning present a largely-unexplored opportunity for improving separation models by leveraging unlabelled music data. In this paper, we propose a self-supervised learning framework for music source separation inspired by the HuBERT speech representation model. We first investigate the potential impact of the original HuBERT model by inserting an adapted version of it into the well-known Demucs V2 time-domain separation model architecture. We then propose a time-frequency-domain self-supervised model, Pac-HuBERT (for primitive auditory clustering HuBERT), that we later use in combination with a Res-U-Net decoder for source separation. Pac-HuBERT uses primitive auditory features of music as unsupervised clustering labels to initialize the self-supervised pretraining process using the Free Music Archive (FMA) dataset. The resulting framework achieves better source-to-distortion ratio (SDR) performance on the MusDB18 test set than the original Demucs V2 and Res-U-Net models. We further demonstrate that it can boost performance with small amounts of supervised data. Ultimately, our proposed framework is an effective solution to the challenge of limited clean source data for music source separation.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2304.0216

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Randomly Projected Convex Clustering Model: Motivation, Realization, and Cluster Recovery Guarantees

Wang, Ziwen, Yuan, Yancheng, Ma, Jiaming, Zeng, Tieyong, Sun, Defeng

arXiv.org Artificial IntelligenceMar-29-2023

In this paper, we propose a randomly projected convex clustering model for clustering a collection of $n$ high dimensional data points in $\mathbb{R}^d$ with $K$ hidden clusters. Compared to the convex clustering model for clustering original data with dimension $d$, we prove that, under some mild conditions, the perfect recovery of the cluster membership assignments of the convex clustering model, if exists, can be preserved by the randomly projected convex clustering model with embedding dimension $m = O(\epsilon^{-2}\log(n))$, where $0 < \epsilon < 1$ is some given parameter. We further prove that the embedding dimension can be improved to be $O(\epsilon^{-2}\log(K))$, which is independent of the number of data points. Extensive numerical experiment results will be presented in this paper to demonstrate the robustness and superior performance of the randomly projected convex clustering model. The numerical results presented in this paper also demonstrate that the randomly projected convex clustering model can outperform the randomly projected K-means model in practice.

artificial intelligence, machine learning, rpccm, (15 more...)

arXiv.org Artificial Intelligence

2303.16841

Country:

Asia > China > Hong Kong (0.05)
Asia > China > Jiangsu Province > Yancheng (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Asymptotics for The $k$-means

Zhang, Tonglin

arXiv.org Artificial IntelligenceNov-17-2022

Clustering is one of the most important unsupervised learning techniques for understanding the underlying data structures. The goal is to partition a data set into many subsets, called clusters, such that the observations within the subsets are the most homogeneous and the observations between the subsets are the most heterogeneous. Clustering is usually carried out by specifying a similarity or dissimilarity measure between observations. Examples include the k-means [17, 19, 29, 37], the k-medians [3], the k-modes [5], and the generalized k-means [2, 31, 45], as well as many of their modifications [21, 24, 42]. Among those, the k-means has been considered as one of the most straightforward and popular methods since it was proposed sixty years ago [23, 36]. Although it is well known, the investigation of the theoretical properties is still far behind, leading to difficulties in developing more precise k-means methods in practice. The goal of the present research is to propose a new concept called clustering consistency for the asymptotics of the k-means with a resulting clustering method better than the existing k-means methods adopted by many software packages, including those adopted by R and Python.

artificial intelligence, k-means method, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.10015

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

Huang, Wen-Chin, Yang, Shu-Wen, Hayashi, Tomoki, Toda, Tomoki

arXiv.org Artificial IntelligenceJul-9-2022

We present a large-scale comparative study of self-supervised speech representation (S3R)-based voice conversion (VC). In the context of recognition-synthesis VC, S3Rs are attractive owing to their potential to replace expensive supervised representations such as phonetic posteriorgrams (PPGs), which are commonly adopted by state-of-the-art VC systems. Using S3PRL-VC, an open-source VC software we previously developed, we provide a series of in-depth objective and subjective analyses under three VC settings: intra-/cross-lingual any-to-one (A2O) and any-to-any (A2A) VC, using the voice conversion challenge 2020 (VCC2020) dataset. We investigated S3R-based VC in various aspects, including model type, multilinguality, and supervision. We also studied the effect of a post-discretization process with k-means clustering and showed how it improves in the A2A setting. Finally, the comparison with state-of-the-art VC systems demonstrates the competitiveness of S3R-based VC and also sheds light on the possible improving directions.

machine learning, natural language, proc, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JSTSP.2022.3193761

2207.04356

Country:

North America > United States (0.28)
Asia > Japan (0.04)
Asia > Taiwan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

K-Means Clustering Project -- Banknote Authentication

#artificialintelligenceJun-4-2021, 10:06:04 GMT

Have you ever been in a situation where you were handing money to the clerks at a supermarket only to find that the money is fake while there was a long line of people behind you waiting to check out? I personally had experienced this situation one time and that embarrassment of being assumed to be an immoral cheapskate just stuck in my head for a long time. This motivated me to conduct this project, building a K-Means Clustering model to detect if a banknote is real or fake. This dataset is about distinguishing genuine and forged banknotes. Data were extracted from images that were taken from genuine and forged banknote-like specimens.

dataset, k-means, k-means clustering project, (12 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.62)

Add feedback

How to Build Audience Clusters With Website Data Using BigQuery ML

#artificialintelligenceNov-8-2020, 13:12:43 GMT

A common marketing analytics challenge is to understand consumer behavior and develop customer attributes or archetypes. As organizations get better at tackling this problem, they can activate marketing strategies to incorporate additional customer knowledge into their campaigns. Building customer profiles is now easier than ever with BigQuery ML, using a technique called clustering. In this post, you'll learn how to create segmentation and how to use these audiences for marketing activation. Clustering algorithms can group similar user behavior together to build segmentation used for marketing.

algorithm, bigquery ml, google analytic 360, (13 more...)

#artificialintelligence

Industry: Information Technology > Services (0.67)

Technology:

Information Technology > Enterprise Applications > Customer Relationship Management (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.38)

Add feedback

Deep Clustering for Financial Market Segmentation

#artificialintelligenceNov-25-2019, 09:33:17 GMT

Unsupervised learning, supervised learning and reinforcement learning are three main categories of machine learning methods. Unsupervised learning has many applications such as clustering, dimensionality reduction, etc. The machine learning algorithms K-means and Principal Component Analysis (PCA) are widely used for clustering and dimensionality reduction respectively. Similarly to PCA, the T-distributed Stochastic Neighbor Embedding (t-SNE) is another unsupervised machine learning algorithm for dimensionality reduction. With the advancement of unsupervised deep learning, the Autoencoder neural network is now frequently used for high dimensionality (e.g., a dataset with thousands or more features) reduction. Autoencoder can also be combined with supervised learning (e.g., Random Forest) to form Semi-supervised learning method (see deep patient as an example).

clustering, dataset, silhouette score, (9 more...)

#artificialintelligence

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback