Goto

Collaborating Authors

 Clustering


Robust Trimmed k-means

arXiv.org Machine Learning

Clustering is a fundamental tool in unsupervised learning, used to group objects by distinguishing between similar and dissimilar features of a given data set. One of the most common clustering algorithms is k-means. Unfortunately, when dealing with real-world data many traditional clustering algorithms are compromised by lack of clear separation between groups, noisy observations, and/or outlying data points. Thus, robust statistical algorithms are required for successful data analytics. Current methods that robustify k-means clustering are specialized for either single or multi-membership data, but do not perform competitively in both cases. We propose an extension of the k-means algorithm, which we call Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points and can be applied to either single- or multi-membership data. We test RTKM on various real-world datasets and show that RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers. We also show that RTKM leverages its relative advantages to outperform other methods on multi-membership data containing outliers.


AdaCon: Adaptive Context-Aware Object Detection for Resource-Constrained Embedded Devices

arXiv.org Artificial Intelligence

Convolutional Neural Networks achieve state-of-the-art accuracy in object detection tasks. However, they have large computational and energy requirements that challenge their deployment on resource-constrained edge devices. Object detection takes an image as an input, and identifies the existing object classes as well as their locations in the image. In this paper, we leverage the prior knowledge about the probabilities that different object categories can occur jointly to increase the efficiency of object detection models. In particular, our technique clusters the object categories based on their spatial co-occurrence probability. We use those clusters to design an adaptive network. During runtime, a branch controller decides which part(s) of the network to execute based on the spatial context of the input frame. Our experiments using COCO dataset show that our adaptive object detection model achieves up to 45% reduction in the energy consumption, and up to 27% reduction in the latency, with a small loss in the average precision (AP) of object detection.


Joint Optimization in Edge-Cloud Continuum for Federated Unsupervised Person Re-identification

arXiv.org Artificial Intelligence

Person re-identification (ReID) aims to re-identify a person from non-overlapping camera views. Since person ReID data contains sensitive personal information, researchers have adopted federated learning, an emerging distributed training method, to mitigate the privacy leakage risks. However, existing studies rely on data labels that are laborious and time-consuming to obtain. We present FedUReID, a federated unsupervised person ReID system to learn person ReID models without any labels while preserving privacy. FedUReID enables in-situ model training on edges with unlabeled data. A cloud server aggregates models from edges instead of centralizing raw data to preserve data privacy. Moreover, to tackle the problem that edges vary in data volumes and distributions, we personalize training in edges with joint optimization of cloud and edge. Specifically, we propose personalized epoch to reassign computation throughout training, personalized clustering to iteratively predict suitable labels for unlabeled data, and personalized update to adapt the server aggregated model to each edge. Extensive experiments on eight person ReID datasets demonstrate that FedUReID not only achieves higher accuracy but also reduces computation cost by 29%. Our FedUReID system with the joint optimization will shed light on implementing federated learning to more multimedia tasks without data labels.


Use-Cases of K-Means Clustering

#artificialintelligence

In this blog, first of all we will see what is K-Means Clustering Algorithm and then discuss about some of it's Industry use-cases. Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset and are allowed to act on that data without any supervision. Unsupervised learning cannot be directly applied to a regression or classification problem because unlike supervised learning, we have the input data but no corresponding output data. The goal of unsupervised learning is to find the underlying structure of dataset, group that data according to similarities, and represent that dataset in a compressed format. K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters.


Learn K-means Clustering

#artificialintelligence

One of the most popular Machine Learning algorithms is K-means clustering. It is an unsupervised learning algorithm, meaning that it is used for unlabeled datasets. K-means clustering algorithm is an unsupervised technique to group data in the order of their similarities. We then find patterns within this data which are present as k-clusters. Each of the n value belongs to the k cluster with the nearest mean.


Intelligent computational model for the classification of Covid-19 with chest radiography compared to other respiratory diseases

arXiv.org Artificial Intelligence

Lung X-ray images, if processed using statistical and computational methods, can distinguish pneumonia from COVID-19. The present work shows that it is possible to extract lung X-ray characteristics to improve the methods of examining and diagnosing patients with suspected COVID-19, distinguishing them from malaria, dengue, H1N1, tuberculosis, and Streptococcus pneumonia. More precisely, an intelligent computational model was developed to process lung X-ray images and classify whether the image is of a patient with COVID-19. The images were processed and extracted their characteristics. These characteristics were the input data for an unsupervised statistical learning method, PCA, and clustering, which identified specific attributes of X-ray images with Covid-19. The introduction of statistical models allowed a fast algorithm, which used the X-means clustering method associated with the Bayesian Information Criterion (CIB). The developed algorithm efficiently distinguished each pulmonary pathology from X-ray images. The method exhibited excellent sensitivity. The average recognition accuracy of COVID-19 was 0.93 and 0.051.


A Mathematical Approach to Constraining Neural Abstraction and the Mechanisms Needed to Scale to Higher-Order Cognition

arXiv.org Artificial Intelligence

Artificial intelligence has made great strides in the last decade but still falls short of the human brain, the best-known example of intelligence. Not much is known of the neural processes that allow the brain to make the leap to achieve so much from so little beyond its ability to create knowledge structures that can be flexibly and dynamically combined, recombined, and applied in new and novel ways. This paper proposes a mathematical approach using graph theory and spectral graph theory, to hypothesize how to constrain these neural clusters of information based on eigen-relationships. This same hypothesis is hierarchically applied to scale up from the smallest to the largest clusters of knowledge that eventually lead to model building and reasoning.


K-means for Beginners: How to Build from Scratch in Python

#artificialintelligence

The K-means algorithm is a method for dividing a set of data points into distinct clusters, or groups, based on similar attributes. It is an unsupervised learning algorithm which means it does not require labeled data in order to find patterns in the dataset. K-means is an approachable introduction to clustering for developers and data scientists interested in machine learning. In this article, you will learning how to implement k-means entirely from scratch and gain a strong understanding of the k-means algorithm. The goal of clustering is to divide items into groups such that objects in a group are more similar than those outside the group. Stepping aside from programming for a second, we can gain a better understanding of clusters.


Personalized Federated Learning with Clustering: Non-IID Heart Rate Variability Data Application

arXiv.org Artificial Intelligence

While machine learning techniques are being applied to various fields for their exceptional ability to find complex relations in large datasets, the strengthening of regulations on data ownership and privacy is causing increasing difficulty in its application to medical data. In light of this, Federated Learning has recently been proposed as a solution to train on private data without breach of confidentiality. This conservation of privacy is particularly appealing in the field of healthcare, where patient data is highly confidential. However, many studies have shown that its assumption of Independent and Identically Distributed data is unrealistic for medical data. In this paper, we propose Personalized Federated Cluster Models, a hierarchical clustering-based FL process, to predict Major Depressive Disorder severity from Heart Rate Variability. By allowing clients to receive more personalized model, we address problems caused by non-IID data, showing an accuracy increase in severity prediction. This increase in performance may be sufficient to use Personalized Federated Cluster Models in many existing Federated Learning scenarios.


A Machine learning approach for rapid disaster response based on multi-modal data. The case of housing & shelter needs

arXiv.org Artificial Intelligence

Along with climate change, more frequent extreme events, such as flooding and tropical cyclones, threaten the livelihoods and wellbeing of poor and vulnerable populations. One of the most immediate needs of people affected by a disaster is finding shelter. While the proliferation of data on disasters is already helping to save lives, identifying damages in buildings, assessing shelter needs, and finding appropriate places to establish emergency shelters or settlements require a wide range of data to be combined rapidly. To address this gap and make a headway in comprehensive assessments, this paper proposes a machine learning workflow that aims to fuse and rapidly analyse multimodal data. This workflow is built around open and online data to ensure scalability and broad accessibility. Based on a database of 19 characteristics for more than 200 disasters worldwide, a fusion approach at the decision level was used. This technique allows the collected multimodal data to share a common semantic space that facilitates the prediction of individual variables. Each fused numerical vector was fed into an unsupervised clustering algorithm called Self-Organizing-Maps (SOM). The trained SOM serves as a predictor for future cases, allowing predicting consequences such as total deaths, total people affected, and total damage, and provides specific recommendations for assessments in the shelter and housing sector. To achieve such prediction, a satellite image from before the disaster and the geographic and demographic conditions are shown to the trained model, which achieved a prediction accuracy of 62 %