AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Quantile-based fuzzy C-means clustering of multivariate time series: Robust techniques

López-Oriona, Ángel, D'Urso, Pierpaolo, Vilar, José Antonio, Lafuente-Rego, Borja

arXiv.org Machine LearningSep-22-2021

In particular, time series data have become ubiquitous in our days, arising frequently in a broad variety of fields including medicine, computer science, finance, environmental sciences, machine learning, marketing and neuroscience, among many others. Typically, time series involve a huge number of records, present dynamic behavior patterns which might change over time, and one frequently has to deal with realizations of different length. Due to this complex nature, standard techniques to perform data mining tasks as classification, clustering or anomaly detection often produce unsatisfactory results. Complexity is still greater by treating with high dimensional time series, where the interdependence structure and large dimensionality are serious obstacles to develop efficient procedures. Univariate time series (UTS) were the main focus of intensive research until recently, but multivariate time series (MTS) have received lately a great deal of attention due to the advance of technology and storage capabilities of everyday devices. Well-known examples of MTS are multi-lead ECG signals of patients or records containing several economic indicators of a given country over time, but many other examples can be easily obtained from different fields. Among time series data mining tasks, clustering is a central problem. In fact, identifying groups of similar series is basic for many applications in order to detect a few representative patterns, forecast future performances, quantify affinity, recognize dynamic changes and structural breaks... However, unlike traditional databases, similarity search in time series data is a complex issue that cannot be addressed with conventional methods.

scenario 1, scenario 2, time sery, (14 more...)

arXiv.org Machine Learning

2109.11027

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Galicia > A Coruña Province > A Coruña (0.04)
Europe > Italy (0.04)

Genre: Research Report > Experimental Study (0.45)

Industry:

Banking & Finance > Trading (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.54)
Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.68)

Add feedback

Multi-Slice Clustering for 3-order Tensor Data

Andriantsiory, Dina Faneva, Geloun, Joseph Ben, Lebbah, Mustapha

arXiv.org Machine LearningSep-22-2021

Several methods of triclustering of three dimensional data require the specification of the cluster size in each dimension. This introduces a certain degree of arbitrariness. To address this issue, we propose a new method, namely the multi-slice clustering (MSC) for a 3-order tensor data set. We analyse, in each dimension or tensor mode, the spectral decomposition of each tensor slice, i.e. a matrix. Thus, we define a similarity measure between matrix slices up to a threshold (precision) parameter, and from that, identify a cluster. The intersection of all partial clusters provides the desired triclustering. The effectiveness of our algorithm is shown on both synthetic and real-world data sets.

multi-slice clustering, probability, tensor, (10 more...)

arXiv.org Machine Learning

2109.10803

Country:

Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

A New Robust Scalable Singular Value Decomposition Algorithm for Video Surveillance Background Modelling

Roy, Subhrajyoty, Basu, Ayanendranath, Ghosh, Abhik

arXiv.org Machine LearningSep-22-2021

A basic algorithmic task in automated video surveillance is to separate background and foreground objects. Camera tampering, noisy videos, low frame rate, etc., pose difficulties in solving the problem. A general approach which classifies the tampered frames, and performs subsequent analysis on the remaining frames after discarding the tampered ones, results in loss of information. We propose a robust singular value decomposition (SVD) approach based on the density power divergence to perform background separation robustly even in the presence of tampered frames. We also provide theoretical results and perform simulations to validate the superiority of the proposed method over the few existing robust SVD methods. Finally, we indicate several other use-cases of the proposed method to show its general applicability to a large range of problems.

algorithm, right 0, singular value, (11 more...)

arXiv.org Machine Learning

2109.1068

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
Asia > Middle East > Iran (0.04)
Asia > India > West Bengal > Kolkata (0.04)

Genre: Research Report (0.63)

Industry:

Banking & Finance > Trading (1.00)
Commercial Services & Supplies > Security & Alarm Services (0.71)
Information Technology > Security & Privacy (0.67)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Classification with Nearest Disjoint Centroids

Fraiman, Nicolas, Li, Zichao

arXiv.org Machine LearningSep-21-2021

In this paper, we develop a new classification method based on nearest centroid, and it is called the nearest disjoint centroid classifier. Our method differs from the nearest centroid classifier in the following two aspects: (1) the centroids are defined based on disjoint subsets of features instead of all the features, and (2) the distance is induced by the dimensionality-normalized norm instead of the Euclidean norm. We provide a few theoretical results regarding our method. In addition, we propose a simple algorithm based on adapted k-means clustering that can find the disjoint subsets of features used in our method, and extend the algorithm to perform feature selection. We evaluate and compare the performance of our method to other closely related classifiers on both simulated data and real-world gene expression datasets. The results demonstrate that our method is able to outperform other competing classifiers by having smaller misclassification rates and/or using fewer features in various settings and situations.

centroid classifier, classifier, misclassification rate, (14 more...)

arXiv.org Machine Learning

2109.10436

Country: Asia > Vietnam > Long An Province > Tân An (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Add feedback

Consistency of spectral clustering for directed network community detection

Qing, Huan, Wang, Jingli

arXiv.org Machine LearningSep-21-2021

Directed networks appear in various areas, such as biology, sociology, physiology and computer science. However, at present, most network analysis ignores the direction. In this paper, we construct a spectral clustering method based on the singular decomposition of the adjacency matrix to detect community in directed stochastic block model (DiSBM). By considering a sparsity parameter, under some mild conditions, we show the proposed approach can consistently recover hidden row and column communities for different scaling of degrees. By considering the degree heterogeneity of both row and column nodes, we further establish a theoretical framework for directed degree corrected stochastic block model (DiDCSBM). We show that the spectral clustering method stably yields consistent community detection for row clusters and column clusters under mild constraints on the degree heterogeneity. Our theoretical results under DiSBM and DiDCSBM provide some innovations on some special directed networks, such as directed network with balanced clusters, directed network with nodes enjoying similar degrees, and the directed Erd\"os-R\'enyi graph. Furthermore, our theoretical results under DiDCSBM are consistent with those under DiSBM when DiDCSBM degenerates to DiSBM.

didcsbm, matrix, stochastic block model, (15 more...)

arXiv.org Machine Learning

2109.10319

Country:

North America > United States (0.14)
Asia > China (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Exploring Clustering Algorithms: Explanation and Use Cases - neptune.ai

#artificialintelligenceSep-18-2021, 18:55:59 GMT

Clustering (cluster analysis) is grouping objects based on similarities. Clustering can be used in many areas, including machine learning, computer graphics, pattern recognition, image analysis, information retrieval, bioinformatics, and data compression. Clusters are a tricky concept, which is why there are so many different clustering algorithms. Different cluster models are employed, and for each of these cluster models, different algorithms can be given. Clusters found by one clustering algorithm will definitely be different from clusters found by a different algorithm. Grouping an unlabelled example is called clustering. As the samples are unlabelled, clustering relies on unsupervised machine learning. If the examples are labeled, then it becomes classification. Knowledge of cluster models is fundamental if you want to understand the differences between various cluster algorithms, and in this article, we're going to explore this topic in depth.

algorithm, centroid, k-means, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

5 Clustering Algorithms Data Scientists Need To Know - The Key Is Always To Understand The Basic Approach Of Any Algorithm You Want To Use – Fly Spaceships With Your Mind

#artificialintelligenceSep-17-2021, 11:45:19 GMT

As a data scientist, you have several basic tools at your disposal, which you can also apply in combination to a data set. More and more complex dependencies are formed. This makes it all the more difficult to recognize these similar properties and to assign the data to so-called clusters in a way that can be evaluated. You have certainly heard of these algorithms and maybe used one or the other, but do you really know what clustering algorithms are? So let's first clarify what these algorithms are in the first place.

algorithm, basic approach, clustering algorithm data scientist, (10 more...)

#artificialintelligence

Industry:

Government > Military > Air Force (0.40)
Aerospace & Defense (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

Level Sets or Gradient Lines? A Unifying View of Modal Clustering

Arias-Castro, Ery, Qiao, Wanli

arXiv.org Machine LearningSep-17-2021

Up until the 1970's there were two main ways of clustering points in space. One of them, perhaps pioneered by Pearson [44], was to fit a (usually Gaussian) mixture to the data, and that being done, classify each data point -- as well as any other point available at a later date -- according to the most likely component in the mixture. The other one was based on a direct partitioning of the space, most notably by minimization of the average minimum squared distance to a center: the K-means problem, whose computational difficulty led to a number of famous algorithms [22, 31, 36, 37, 39] and likely played a role in motivating the development of hierarchical clustering [21, 25, 54, 63]. In the 1970's, two decidedly nonparametric approaches to clustering were proposed, both based on the topography given by the population density. Of course, in practice, the density is estimated, often by some form of kernel density estimation.

cluster tree, gradient flow, gradient line, (16 more...)

arXiv.org Machine Learning

2109.08362

Country:

North America > United States > Virginia > Fairfax County > Fairfax (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Information-theoretic Classification Accuracy: A Criterion that Guides Data-driven Combination of Ambiguous Outcome Labels in Multi-class Classification

Zhang, Chihao, Chen, Yiling Elaine, Zhang, Shihua, Li, Jingyi Jessica

arXiv.org Machine LearningSep-17-2021

Outcome labeling ambiguity and subjectivity are ubiquitous in real-world datasets. While practitioners commonly combine ambiguous outcome labels in an ad hoc way to improve the accuracy of multi-class classification, there lacks a principled approach to guide label combination by any optimality criterion. To address this problem, we propose the information-theoretic classification accuracy (ITCA), a criterion of outcome "information" conditional on outcome prediction, to guide practitioners on how to combine ambiguous outcome labels. ITCA indicates a balance in the trade-off between prediction accuracy (how well do predicted labels agree with actual labels) and prediction resolution (how many labels are predictable). To find the optimal label combination indicated by ITCA, we develop two search strategies: greedy search and breadth-first search. Notably, ITCA and the two search strategies are adaptive to all machine-learning classification algorithms. Coupled with a classification algorithm and a search strategy, ITCA has two uses: to improve prediction accuracy and to identify ambiguous labels. We first verify that ITCA achieves high accuracy with both search strategies in finding the correct label combinations on synthetic and real data. Then we demonstrate the effectiveness of ITCA in diverse applications including medical prognosis, cancer survival prediction, user demographics prediction, and cell type classification.

algorithm, class combination, classification algorithm, (16 more...)

arXiv.org Machine Learning

2109.00582

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Beijing > Beijing (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.61)

Add feedback

Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-Profits in Improving Maternal and Child Health

Mate, Aditya, Madaan, Lovish, Taneja, Aparna, Madhiwalla, Neha, Verma, Shresth, Singh, Gargi, Hegde, Aparna, Varakantham, Pradeep, Tambe, Milind

arXiv.org Artificial IntelligenceSep-16-2021

The widespread availability of cell phones has enabled non-profits to deliver critical health information to their beneficiaries in a timely manner. This paper describes our work to assist non-profits that employ automated messaging programs to deliver timely preventive care information to beneficiaries (new and expecting mothers) during pregnancy and after delivery. Unfortunately, a key challenge in such information delivery programs is that a significant fraction of beneficiaries drop out of the program. Yet, non-profits often have limited health-worker resources (time) to place crucial service calls for live interaction with beneficiaries to prevent such engagement drops. To assist non-profits in optimizing this limited resource, we developed a Restless Multi-Armed Bandits (RMABs) system. One key technical contribution in this system is a novel clustering method of offline historical data to infer unknown RMAB parameters. Our second major contribution is evaluation of our RMAB system in collaboration with an NGO, via a real-world service quality improvement study. The study compared strategies for optimizing service calls to 23003 participants over a period of 7 weeks to reduce engagement drops. We show that the RMAB group provides statistically significant improvement over other comparison groups, reducing ~ 30% engagement drops. To the best of our knowledge, this is the first study demonstrating the utility of RMABs in real world public health settings. We are transitioning our RMAB system to the NGO for real-world use.

beneficiary, rmab, service call, (14 more...)

arXiv.org Artificial Intelligence

2109.08075

Country:

Asia > India (0.04)
Asia > Singapore (0.04)
Africa > Nigeria (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.76)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback