AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

GPU-accelerated Faster Mean Shift with euclidean distance metrics

You, Le, Jiang, Han, Hu, Jinyong, Chang, Chorng, Chen, Lingxi, Cui, Xintong, Zhao, Mengyang

arXiv.org Artificial IntelligenceDec-27-2021

Handling clustering problems are important in data statistics, pattern recognition and image processing. The mean-shift algorithm, a common unsupervised algorithms, is widely used to solve clustering problems. However, the mean-shift algorithm is restricted by its huge computational resource cost. In previous research[10], we proposed a novel GPU-accelerated Faster Mean-shift algorithm, which greatly speed up the cosine-embedding clustering problem. In this study, we extend and improve the previous algorithm to handle Euclidean distance metrics. Different from conventional GPU-based mean-shift algorithms, our algorithm adopts novel Seed Selection & Early Stopping approaches, which greatly increase computing speed and reduce GPU memory consumption. In the simulation testing, when processing a 200K points clustering problem, our algorithm achieved around 3 times speedup compared to the state-of-the-art GPU-based mean-shift algorithms with optimized GPU memory consumption. Moreover, in this study, we implemented a plug-and-play model for faster mean-shift algorithm, which can be easily deployed. (Plug-and-play model is available: https://github.com/masqm/Faster-Mean-Shift-Euc)

algorithm, memory consumption, vector, (13 more...)

arXiv.org Artificial Intelligence

2112.13891

Country:

North America > United States > Massachusetts > Middlesex County > Medford (0.05)
North America > United States > New Hampshire > Grafton County > Hanover (0.04)
North America > United States > California (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > New Finding (0.54)

Industry: Information Technology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Unsupervised Clustering Active Learning for Person Re-identification

Gao, Wenjing, Li, Minxian

arXiv.org Artificial IntelligenceDec-25-2021

Supervised person re-identification (re-id) approaches require a large amount of pairwise manual labeled data, which is not applicable in most real-world scenarios for re-id deployment. On the other hand, unsupervised re-id methods rely on unlabeled data to train models but performs poorly compared with supervised re-id methods. In this work, we aim to combine unsupervised re-id learning with a small number of human annotations to achieve a competitive performance. Towards this goal, we present a Unsupervised Clustering Active Learning (UCAL) re-id deep learning approach. It is capable of incrementally discovering the representative centroid-pairs and requiring human annotate them. These few labeled representative pairwise data can improve the unsupervised representation learning model with other large amounts of unlabeled data. More importantly, because the representative centroid-pairs are selected for annotation, UCAL can work with very low-cost human effort. Extensive experiments demonstrate the superiority of the proposed model over state-of-the-art active learning methods on three re-id benchmark datasets.

learning, person re-identification, proc, (12 more...)

arXiv.org Artificial Intelligence

2112.13308

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Application of Markov Structure of Genomes to Outlier Identification and Read Classification

Karr, Alan F., Hauzel, Jason, Porter, Adam A., Schaefer, Marcel

arXiv.org Machine LearningDec-24-2021

That the sequential structure of genomes is important has been known since the discovery of DNA. In this paper we employ a statistics and stochastic process perspective on triplets of successive bases to address two important applications: identifying outliers in genome databases, and classifying reads in the metagenomic context of reference-guided assembly. From this stochastic process perspective, triplets are a second-order Markov chain specified by the distribution of each base conditional on its two immediate predecessors. To be sure, studying genomes via base sequence distributions is not novel. Previous papers have addressed genome signatures (Karlin et al., 1997; Campbell et al., 1999; Takashi et al., 2003), as well as frequentist (Rosen et al., 2008) and Bayesian (Wang et al., 2007) approaches to classification problems.

coronavirus genome, genome, probability, (12 more...)

arXiv.org Machine Learning

2112.13117

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

On the Unreasonable Efficiency of State Space Clustering in Personalization Tasks

Dereventsov, Anton, Vatsavai, Ranga Raju, Webster, Clayton

arXiv.org Artificial IntelligenceDec-24-2021

In this effort we consider a reinforcement learning (RL) technique for solving personalization tasks with complex reward signals. In particular, our approach is based on state space clustering with the use of a simplistic $k$-means algorithm as well as conventional choices of the network architectures and optimization algorithms. Numerical examples demonstrate the efficiency of different RL procedures and are used to illustrate that this technique accelerates the agent's ability to learn and does not restrict the agent's performance.

agent, algorithm, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2112.13141

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Tennessee > Knox County > Knoxville (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

DBScan Clustering Algorithm

#artificialintelligenceDec-23-2021, 12:55:42 GMT

Clustering is an important topic in busyness, because it helps us to reduce the number of features to some typology, to some clusters which, in a case that data allows us, can give us more informations about our topic of interest. In a data science literature it is usually presented as dimension reduction technique, but in science, or even in data science it could reveal some additional pattern in data that is not obvious at the first glance. Imagine you have some features about some students: their marks, their personality traits, their ability scores, their motivation. Clustering could reveal you the completely new types of (un)successful students (it could be someone with high ability and low motivation -- underachiever, but at the same time it could be someone with high motivation and really good marks, but low abilities -- overachiever). This could simply done by clustering, while our cluster names (overachiever, underachiever) are basically interpretations of the clusters.

dbscan clustering algorithm, domain knowledge, underachiever, (6 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.59)

Add feedback

Optimal Variable Clustering for High-Dimensional Matrix Valued Data

Lee, Inbeom, Deng, Siyi, Ning, Yang

arXiv.org Machine LearningDec-23-2021

Matrix valued data has become increasingly prevalent in many applications. Most of the existing clustering methods for this type of data are tailored to the mean model and do not account for the dependence structure of the features, which can be very informative, especially in high-dimensional settings. To extract the information from the dependence structure for clustering, we propose a new latent variable model for the features arranged in matrix form, with some unknown membership matrices representing the clusters for the rows and columns. Under this model, we further propose a class of hierarchical clustering algorithms using the difference of a weighted covariance matrix as the dissimilarity measure. Theoretically, we show that under mild conditions, our algorithm attains clustering consistency in the high-dimensional setting. While this consistency result holds for our algorithm with a broad class of weighted covariance matrices, the conditions for this result depend on the choice of the weight. To investigate how the weight affects the theoretical performance of our algorithm, we establish the minimax lower bound for clustering under our latent variable model. Given these results, we identify the optimal weight in the sense that using this weight guarantees our algorithm to be minimax rate-optimal in terms of the magnitude of some cluster separation metric. The practical implementation of our algorithm with the optimal weight is also discussed. Finally, we conduct simulation studies to evaluate the finite sample performance of our algorithm and apply the method to a genomic dataset.

algorithm, algorithm 1, matrix, (15 more...)

arXiv.org Machine Learning

2112.12909

Country:

North America > United States > New York > Tompkins County > Ithaca (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.87)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

HuBERT Explained

#artificialintelligenceDec-21-2021, 12:05:59 GMT

The HuBERT model architecture follows the wav2vec 2.0 architecture consisting of: The number of each of these components varies between the base, large and x-large variations. Each component and its task will be better explained while explaining the training loop. The first training step consists of discovering the hidden units, and the process begins with extracting MFCCs(Mel frequency cepstrum) from the audio waveform. These are raw acoustic features useful for representing speech. Each segment of audio is then passed to the K-means clustering algorithm, and assigned to one of K clusters.

architecture, hubert explained, training step, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.61)

Add feedback

Graph-based Ensemble Machine Learning for Student Performance Prediction

Wang, Yinkai, Ding, Aowei, Guan, Kaiyi, Wu, Shixi, Du, Yuanqi

arXiv.org Artificial IntelligenceDec-21-2021

Student performance prediction is a critical research problem to understand the students' needs, present proper learning opportunities/resources, and develop the teaching quality. However, traditional machine learning methods fail to produce stable and accurate prediction results. In this paper, we propose a graph-based ensemble machine learning method that aims to improve the stability of single machine learning methods via the consensus of multiple methods. To be specific, we leverage both supervised prediction methods and unsupervised clustering methods, build an iterative approach that propagates in a bipartite graph as well as converges to more stable and accurate prediction results. Extensive experiments demonstrate the effectiveness of our proposed method in predicting more accurate student performance. Specifically, our model outperforms the best traditional machine learning algorithms by up to 14.8% in prediction accuracy.

accuracy, algorithm, prediction, (11 more...)

arXiv.org Artificial Intelligence

2112.07893

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.50)

Industry: Education > Assessment & Standards > Student Performance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

K-means clustering

#artificialintelligenceDec-20-2021, 16:50:46 GMT

The topic I will try to explain today is K-means clustering. First, you might be wondering what the "K" means. K is a parameter that corresponds to the number of clusters you are trying to detect. For example, in order to detect 3 clusters like on the image on top, you would need to use K 3. But what does it mean?

algorithm, algorithm work, k-means, (1 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Learning Spatio-Temporal Specifications for Dynamical Systems

Alsalehi, Suhail, Aasi, Erfan, Weiss, Ron, Belta, Calin

arXiv.org Artificial IntelligenceDec-20-2021

Learning dynamical systems properties from data provides important insights that help us understand such systems and mitigate undesired outcomes. In this work, we propose a framework for learning spatio-temporal (ST) properties as formal logic specifications from data. We introduce SVM-STL, an extension of Signal Signal Temporal Logic (STL), capable of specifying spatial and temporal properties of a wide range of dynamical systems that exhibit time-varying spatial patterns. Our framework utilizes machine learning techniques to learn SVM-STL specifications from system executions given by sequences of spatial patterns. We present methods to deal with both labeled and unlabeled data. In addition, given system requirements in the form of SVM-STL specifications, we provide an approach for parameter synthesis to find parameters that maximize the satisfaction of such specifications. Our learning framework and parameter synthesis approach are showcased in an example of a reaction-diffusion system.

artificial intelligence, machine learning, specification, (18 more...)

arXiv.org Artificial Intelligence

2112.10714

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback