Clustering
A Novel Clustering-Based Algorithm for Continuous and Non-invasive Cuff-Less Blood Pressure Estimation
Farki, Ali, Kazemzadeh, Reza Baradaran, Noughabi, Elham Akhondzadeh
Continuous blood pressure (BP) measurements can reflect a body's response to diseases and serve as a predictor of cardiovascular and other health conditions. While current cuff-based BP measurement methods are incapable of providing continuous BP readings, invasive BP monitoring methods also tend to cause patient dissatisfaction and can potentially cause infection. In this research, we developed a method for estimating blood pressure based on the features extracted from Electrocardiogram (ECG) and Photoplethysmogram (PPG) signals and the Arterial Blood Pressure (ABP) data. The vector of features extracted from the preprocessed ECG and PPG signals is used in this approach, which include Pulse Transit Time (PTT), PPG Intensity Ratio (PIR), and Heart Rate (HR), as the input of a clustering algorithm and then developing separate regression models like Random Forest Regression, Gradient Boosting Regression, and Multilayer Perceptron Regression algorithms for each resulting cluster. We evaluated and compared the findings to create the model with the highest accuracy by applying the clustering approach and identifying the optimal number of clusters, and eventually the acceptable prediction model. The paper compares the results obtained with and without this clustering. The results show that the proposed clustering approach helps obtain more accurate estimates of Systolic Blood Pressure (SBP) and Diastolic Blood Pressure (DBP). Given the inconsistency, high dispersion, and multitude of trends in the datasets for different features, using the clustering approach improved the estimation accuracy by 50-60%.
SSSNET: Semi-Supervised Signed Network Clustering
He, Yixuan, Reinert, Gesine, Wang, Songchao, Cucuringu, Mihai
Node embeddings are a powerful tool in the analysis of networks; yet, their full potential for the important task of node clustering has not been fully exploited. In particular, most state-of-the-art methods generating node embeddings of signed networks focus on link sign prediction, and those that pertain to node clustering are usually not graph neural network (GNN) methods. Here, we introduce a novel probabilistic balanced normalized cut loss for training nodes in a GNN framework for semi-supervised signed network clustering, called SSSNET. The method is end-to-end in combining embedding generation and clustering without an intermediate step; it has node clustering as main focus, with an emphasis on polarization effects arising in networks. The main novelty of our approach is a new take on the role of social balance theory for signed network embeddings. The standard heuristic for justifying the criteria for the embeddings hinges on the assumption that "an enemy's enemy is a friend". Here, instead, a neutral stance is assumed on whether or not the enemy of an enemy is a friend. Experimental results on various data sets, including a synthetic signed stochastic block model, a polarized version of it, and real-world data at different scales, demonstrate that SSSNET can achieve comparable or better results than state-of-the-art spectral clustering methods, for a wide range of noise and sparsity levels. SSSNET complements existing methods through the possibility of including exogenous information, in the form of node-level features or labels.
Image Compression with K-means Clustering
We've only seen Supervised learning algorithms in my prior articles. In this post, we will learn about K-means and begin with an example 2D dataset. Following that, we will apply the K-means algorithm to compress images by reducing the number of colors in a picture to only those that are common in that image. In Unsupervised learning, our algorithm will learn from unlabelled data instead from a labelled data. So, what is Unsupervised learning?
K Means Clustering in Python : Label the Unlabeled Data
There are some cases when you have a dataset that is mostly unlabeled. The problems start when you want to structure the datasets and make it valuable by labeling it. In machine learning, there are various methods for labeling these datasets. Clustering is one of them. In this tutorial of "How to", you will learn to do K Means Clustering in Python.
K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters
Mohammadi, Seyed Omid, Kalhor, Ahmad, Bodaghi, Hossein
This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. K-splits starts from a small number of clusters and uses the most significant data distribution axis to split these clusters incrementally into better fits if needed. Accuracy and speed are two main advantages of the proposed method. We experiment on six synthetic benchmark datasets plus two real-world datasets MNIST and Fashion-MNIST, to prove that our algorithm has excellent accuracy in finding the correct number of clusters under different conditions. We also show that k-splits is faster than similar methods and can even be faster than the standard k-means in lower dimensions. Finally, we suggest using k-splits to uncover the exact position of centroids and then input them as initial points to the k-means algorithm to fine-tune the results.
Clustering Human Trust Dynamics for Customized Real-time Prediction
Liu, Jundi, Akash, Kumar, Misu, Teruhisa, Wu, Xingwei
Trust calibration is necessary to ensure appropriate user acceptance in advanced automation technologies. A significant challenge to achieve trust calibration is to quantitatively estimate human trust in real-time. Although multiple trust models exist, these models have limited predictive performance partly due to individual differences in trust dynamics. A personalized model for each person can address this issue, but it requires a significant amount of data for each user. We present a methodology to develop customized model by clustering humans based on their trust dynamics. The clustering-based method addresses the individual differences in trust dynamics while requiring significantly less data than personalized model. We show that our clustering-based customized models not only outperform the general model based on entire population, but also outperform simple demographic factor-based customized models. Specifically, we propose that two models based on ``confident'' and ``skeptical'' group of participants, respectively, can represent the trust behavior of the population. The ``confident'' participants, as compared to the ``skeptical'' participants, have higher initial trust levels, lose trust slower when they encounter low reliability operations, and have higher trust levels during trust-repair after the low reliability operations. In summary, clustering-based customized models improve trust prediction performance for further trust calibration considerations.
Clustering Plotted Data by Image Segmentation
Naous, Tarek, Sarkar, Srinjay, Abid, Abubakar, Zou, James
Clustering algorithms are one of the main analytical methods to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a dataset as points in a metric space and compute distances to group together similar points. In this paper, we present a wholly different way of clustering points in 2-dimensional space, inspired by how humans cluster data: by training neural networks to perform instance segmentation on plotted data. Our approach, Visual Clustering, has several advantages over traditional clustering algorithms: it is much faster than most existing clustering algorithms (making it suitable for very large datasets), it agrees strongly with human intuition for clusters, and it is by default hyperparameter free (although additional steps with hyperparameters can be introduced for more control of the algorithm). We describe the method and compare it to ten other clustering methods on synthetic data to illustrate its advantages and disadvantages. We then demonstrate how our approach can be extended to higher dimensional data and illustrate its performance on real-world data. The implementation of Visual Clustering is publicly available and can be applied to any dataset in a few lines of code.
Graphon based Clustering and Testing of Networks: Algorithms and Theory
Sabanayagam, Mahalakshmi, Vankadara, Leena Chennuru, Ghoshdastidar, Debarghya
Network-valued data are encountered in a wide range of applications, and pose challenges in learning due to their complex structure and absence of vertex correspondence. Typical examples of such problems include classification or grouping of protein structures and social networks. Various methods, ranging from graph kernels to graph neural networks, have been proposed that achieve some success in graph classification problems. However, most methods have limited theoretical justification, and their applicability beyond classification remains unexplored. In this work, we propose methods for clustering multiple graphs, without vertex correspondence, that are inspired by the recent literature on estimating graphons-- symmetric functions corresponding to infinite vertex limit of graphs. We propose a novel graph distance based on sorting-and-smoothing graphon estimators. Using the proposed graph distance, we present two clustering algorithms and show that they achieve state-of-the-art results. We prove the statistical consistency of both algorithms under Lipschitz assumptions on the graph degrees. We further study the applicability of the proposed distance for graph two-sample testing problems. Machine learning on graphs has evolved considerably over the past two decades. The traditional view towards network analysis is limited to modelling interactions among entities of interest, for instance social networks or world wide web, and learning algorithms based on graph theory have been commonly used to solve these problems (Von Luxburg, 2007; Yan et al., 2006).
Designing Complex Experiments by Applying Unsupervised Machine Learning
Design of experiments (DOE) is playing an essential role in learning and improving a variety of objects and processes. The article discusses the application of unsupervised machine learning to support the pragmatic designs of complex experiments. Complex experiments are characterized by having a large number of factors, mixed-level designs, and may be subject to constraints that eliminate some unfeasible trials for various reasons. Having such attributes, it is very challenging to design pragmatic experiments that are economically, operationally, and timely sound. It means a significant decrease in the number of required trials from a full factorial design, while still attempting to achieve the defined objectives. A beta variational autoencoder (beta-VAE) has been applied to represent trials of the initial full factorial design after filtering out unfeasible trials on the low dimensional latent space. Regarding visualization and interpretability, the paper is limited to 2D representations. Beta-VAE supports (1) orthogonality of the latent space dimensions, (2) isotropic multivariate standard normal distribution of the representation on the latent space, (3) disentanglement of the latent space representation by levels of factors, (4) propagation of the applied constraints of the initial design into the latent space, and (5) generation of trials by decoding latent space points. Having an initial design representation on the latent space with such properties, it allows for the generation of pragmatic design of experiments (G-DOE) by specifying the number of trials and their pattern on the latent space, such as square or polar grids. Clustering and aggregated gradient metrics have been shown to guide grid specification.
Context-Aware Unsupervised Clustering for Person Search
Han, Byeong-Ju, Ko, Kuhyeun, Sim, Jae-Young
The existing person search methods use the annotated labels of person identities to train deep networks in a supervised manner that requires a huge amount of time and effort for human labeling. In this paper, we first introduce a novel framework of person search that is able to train the network in the absence of the person identity labels, and propose efficient unsupervised clustering methods to substitute the supervision process using annotated person identity labels. Specifically, we propose a hard negative mining scheme based on the uniqueness property that only a single person has the same identity to a given query person in each image. We also propose a hard positive mining scheme by using the contextual information of co-appearance that neighboring persons in one image tend to appear simultaneously in other images. The experimental results show that the proposed method achieves comparable performance to that of the state-of-the-art supervised person search methods, and furthermore outperforms the extended unsupervised person re-identification methods on the benchmark person search datasets.