AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

MREC: a fast and versatile framework for aligning and matching point clouds with applications to single cell molecular data

Blumberg, Andrew J., Carriere, Mathieu, Mandell, Michael A., Rabadan, Raul, Villar, Soledad

arXiv.org Machine LearningJan-7-2020

Comparing and aligning large datasets is a pervasive problem occurring across many different knowledge domains. We introduce and study MREC, a recursive decomposition algorithm for computing matchings between data sets. The basic idea is to partition the data, match the partitions, and then recursively match the points within each pair of identified partitions. The matching itself is done using black box matching procedures that are too expensive to run on the entire data set. Using an absolute measure of the quality of a matching, the framework supports optimization over parameters including partitioning procedures and matching algorithms. By design, MREC can be applied to extremely large data sets. We analyze the procedure to describe when we can expect it to work well and demonstrate its flexibility and power by applying it to a number of alignment problems arising in the analysis of single cell molecular data.

application, gromov-wasserstein distance, mrec, (16 more...)

arXiv.org Machine Learning

2001.01666

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.47)
Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

All Machine Learning Models Explained in 6 Minutes

#artificialintelligenceJan-6-2020, 19:24:18 GMT

In my previous article, I explained what regression was and showed how it could be used in application. This week, I'm going to go over the majority of common machine learning models used in practice, so that I can spend more time building and improving models rather than explaining the theory behind it. All machine learning models are categorized as either supervised or unsupervised. If the model is a supervised model, it's then sub-categorized as either a regression or classification model. We'll go over what these terms mean and the corresponding models that fall into each category below.

decision tree, linear regression, regression, (11 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)

Add feedback

Clustering Binary Data by Application of Combinatorial Optimization Heuristics

Trejos-Zelaya, Javier, Amaya-Briceño, Luis Eduardo, Jiménez-Romero, Alejandra, Murillo-Fernández, Alex, Piza-Volio, Eduardo, Villalobos-Arias, Mario

arXiv.org Machine LearningJan-6-2020

We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters. Five new and original methods are introduced, using neighborhoods and population behavior combinatorial optimization metaheuristics: first ones are simulated annealing, threshold accepting and tabu search, and the others are a genetic algorithm and ant colony optimization. The methods are implemented, performing the proper calibration of parameters in the case of heuristics, to ensure good results. From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM. Simulated annealing perform very well, especially compared to classical methods.

binary data, combinatorial optimization, costa rica, (14 more...)

arXiv.org Machine Learning

2001.01809

Country:

North America > United States > New York (0.05)
Africa > Liberia (0.05)
North America > Costa Rica > Cartago Province > Cartago (0.04)
(2 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Cutoff for exact recovery of Gaussian mixture models

Chen, Xiaohui, Yang, Yun

arXiv.org Machine LearningJan-5-2020

We determine the cutoff value on separation of cluster centers for exact recovery of cluster labels in a $K$-component Gaussian mixture model with equal cluster sizes. Moreover, we show that a semidefinite programming (SDP) relaxation of the $K$-means clustering method achieves such sharp threshold for exact recovery without assuming the symmetry of cluster centers.

exact recovery, inequality, mixture model, (15 more...)

arXiv.org Machine Learning

2001.01194

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Champaign County > Champaign (0.04)
(6 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

Data Curves Clustering Using Common Patterns Detection

Xylogiannopoulos, Konstantinos F.

arXiv.org Artificial IntelligenceJan-5-2020

For the past decades we have experienced an enormous expansion of the accumulated data that humanity produces. Daily a numerous number of smart devices, usually interconnected over internet, produce vast, real-values datasets. Time series representing datasets from completely irrelevant domains such as finance, weather, medical applications, traffic control etc. become more and more crucial in human day life. Analyzing and clustering these time series, or in general any kind of curves, could be critical for several human activities. In the current paper, the new Curves Clustering Using Common Patterns (3CP) methodology is introduced, which applies a repeated pattern detection algorithm in order to cluster sequences according to their shape and the similarities of common patterns between time series, data curves and eventually any kind of discrete sequences. For this purpose, the Longest Expected Repeated Pattern Reduced Suffix Array (LERP-RSA) data structure has been used in combination with the All Repeated Patterns Detection (ARPaD) algorithm in order to perform highly accurate and efficient detection of similarities among data curves that can be used for clustering purposes and which also provides additional flexibility and features.

lerp, sequence, time sery, (13 more...)

arXiv.org Artificial Intelligence

2001.02095

Country:

North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.14)
North America > United States > New York (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > Netherlands (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.88)
Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

Add feedback

Root Cause Detection Among Anomalous Time Series Using Temporal State Alignment

Chakraborty, Sayan, Shah, Smit, Soltani, Kiumars, Swigart, Anna

arXiv.org Machine LearningJan-4-2020

The recent increase in the scale and complexity of software systems has introduced new challenges to the time series monitoring and anomaly detection process. A major drawback of existing anomaly detection methods is that they lack contextual information to help stakeholders identify the cause of anomalies. This problem, known as root cause detection, is particularly challenging to undertake in today's complex distributed software systems since the metrics under consideration generally have multiple internal and external dependencies. Significant manual analysis and strong domain expertise is required to isolate the correct cause of the problem. In this paper, we propose a method that isolates the root cause of an anomaly by analyzing the patterns in time series fluctuations. Our method considers the time series as observations from an underlying process passing through a sequence of discretized hidden states. The idea is to track the propagation of the effect when a given problem causes unaligned but homogeneous shifts of the underlying states. We evaluate our approach by finding the root cause of anomalies in Zillows clickstream data by identifying causal patterns among a set of observed fluctuations.

anomaly, detection, time sery, (14 more...)

arXiv.org Machine Learning

2001.01056

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
North America > United States > Texas > Bexar County > San Antonio (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
(2 more...)

Add feedback

Quantum Interference for Counting Clusters

Muthyala, Rohit R, Geiger, Davi, Kedem, Zvi M.

arXiv.org Machine LearningJan-3-2020

Counting the number of clusters, when these clusters overlap significantly is a challenging problem in machine learning. We argue that a purely mathematical quantum theory, formulated using the path integral technique, when applied to non-physics modeling leads to non-physics quantum theories that are statistical in nature. We show that a quantum theory can be a more robust statistical theory to separate data to count overlapping clusters. The theory is also confirmed from data simulations.This works identify how quantum theory can be effective in counting clusters and hope to inspire the field to further apply such techniques.

gaussian distribution, probability, quantum probability, (14 more...)

arXiv.org Machine Learning

2001.04251

Country: North America > United States (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Add feedback

Review of Single-cell RNA-seq Data Clustering for Cell Type Identification and Characterization

Zhang, Shixiong, Li, Xiangtao, Lin, Qiuzhen, Wong, Ka-Chun

arXiv.org Machine LearningJan-3-2020

In recent years, the advances in single-cell RNA-seq techniques have enabled us to perform large-scale transcriptomic profiling at single-cell resolution in a high-throughput manner. Unsupervised learning such as data clustering has become the central component to identify and characterize novel cell types and gene expression patterns. In this study, we review the existing single-cell RNA-seq data clustering methods with critical insights into the related advantages and limitations. In addition, we also review the upstream single-cell RNA-seq data processing techniques such as quality control, normalization, and dimension reduction. We conduct performance comparison experiments to evaluate several popular single-cell RNA-seq clustering approaches on two single-cell transcriptomic datasets.

cell type, dataset, single-cell rna-seq data, (13 more...)

arXiv.org Machine Learning

2001.01006

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Portugal > Castelo Branco > Castelo Branco (0.04)
Asia > China > Hong Kong > Kowloon (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Robust Marine Buoy Placement for Ship Detection Using Dropout K-Means

Ng, Yuting, Pereira, João M., Garagic, Denis, Tarokh, Vahid

arXiv.org Machine LearningJan-2-2020

Marine buoys aid in the battle against Illegal, Unreported and Unregulated (IUU) fishing by detecting fishing vessels in their vicinity. Marine buoys, however, may be disrupted by natural causes and buoy vandalism. To minimize the effects of buoy disruption on a buoy network, we propose a more robust buoy placement using dropout k-means and dropout k-median. We apply dropout k-means and dropout k-median to determine locations for deploying marine buoys in the Gabonese waters near West Africa. We simulated the passage of ships using historical Automatic Identification System (AIS) data, then compared the ship detection probability of dropout k-means to classic k-means and dropout k-median to classic k-median, taking into account that the current sensor detection radius is 10km. With 5 buoys, the buoy arrangement computed by classic k-means, dropout k-means, classic k-median and dropout k-median have ship detection probabilities of 38%, 45%, 48% and 52%.

cluster center, dropout k-means, probability, (12 more...)

arXiv.org Machine Learning

2001.00564

Country:

North America > United States (0.30)
Africa > West Africa (0.25)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.40)

Industry:

Food & Agriculture > Fishing (0.67)
Transportation (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Add feedback

Motivic clustering schemes for directed graphs

Pinto, Guilherme Vituri F., Mémoli, Facundo

arXiv.org Machine LearningJan-1-2020

Motivated by the concept of network motifs we construct certain clustering methods (functors) which are parametrized by a given collection of motifs (or representers).

functor, graph, nullz, (16 more...)

arXiv.org Machine Learning

2001.00278

Country:

South America > Brazil (0.14)
North America > United States > Ohio (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback