AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Unobserved classes and extra variables in high-dimensional discriminant analysis

Fop, Michael, Mattei, Pierre-Alexandre, Bouveyron, Charles, Murphy, Thomas Brendan

arXiv.org Machine LearningFeb-3-2021

In supervised classification problems, the test set may contain data points belonging to classes not observed in the learning phase. Moreover, the same units in the test data may be measured on a set of additional variables recorded at a subsequent stage with respect to when the learning sample was collected. In this situation, the classifier built in the learning phase needs to adapt to handle potential unknown classes and the extra dimensions. We introduce a model-based discriminant approach, Dimension-Adaptive Mixture Discriminant Analysis (D-AMDA), which can detect unobserved classes and adapt to the increasing dimensionality. Model estimation is carried out via a full inductive approach based on an EM algorithm. The method is then embedded in a more general framework for adaptive variable selection and classification suitable for data of large dimensions. A simulation study and an artificial experiment related to classification of adulterated honey samples are used to validate the ability of the proposed framework to deal with complex situations.

classification, discriminant analysis, test data, (16 more...)

arXiv.org Machine Learning

2102.01982

Country:

Europe > France > Provence-Alpes-Côte d'Azur (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Clustering with Penalty for Joint Occurrence of Objects: Computational Aspects

Sokol, Ondřej, Holý, Vladimír

arXiv.org Artificial IntelligenceFeb-2-2021

The idea is to minimize the occurrence of multiple objects from the same cluster in the same set. In the current paper, we study computational aspects of the method. First, we prove that the problem of finding the optimal clustering is NP-hard. Second, to numerically find a suitable clustering, we propose to use the genetic algorithm augmented by a renumbering procedure, a fast task-specific local search heuristic and an initial solution based on a simplified model. Third, in a simulation study, we demonstrate that our improvements of the standard genetic algorithm significantly enhance its computational performance.

algorithm, genetic algorithm, local search, (13 more...)

arXiv.org Artificial Intelligence

2102.01424

Country: Europe > Czechia > Prague (0.05)

Genre: Research Report (0.40)

Industry: Retail (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.71)

Add feedback

Profiling Market Segments using K-Means Clustering

#artificialintelligenceFeb-1-2021, 04:15:12 GMT

Each individual is different and so are his preferences. With such a diversity of needs and preferences, how do you serve all of them? Most importantly, how as a business do you know which customers to target and form the right marketing strategies for each of them? In this customer-centric world, Segmentation is your answer. In this article, we shall explore what is customer profiling, how to build it in Python, and what interpretations to make of it.

customer, individual segment-wise average, profiling market segment, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.65)

Add feedback

Machine Learning in SQL -- it actually works!

#artificialintelligenceJan-31-2021, 23:10:04 GMT

Sometimes it is hard to believe that a world before ML existed. So many modern data analyses are built on top of ML techniques and will continue to do so in the foreseeable future. However, not everyone is able to benefit from these vast advances, because using ML techniques mostly involves using Python, developing code, and understanding many new technologies. Especially when Big Data and distributed systems enter the game, things get messy. This is a problem that SQL query engines are trying to solve.

algorithm, dataset, machine learning, (10 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.32)

Add feedback

A Multiscale Environment for Learning by Diffusion

Murphy, James M., Polk, Sam L.

arXiv.org Machine LearningJan-31-2021

The clustering problem is very general, and different partitions of the same dataset could be considered correct and useful. To fully understand such data, it must be considered at a variety of scales, ranging from coarse to fine. We introduce the Multiscale Environment for Learning by Diffusion (MELD) data model, which is a family of clusterings parameterized by nonlinear diffusion on the dataset. We show that the MELD data model precisely captures latent multiscale structure in data and facilitates its analysis. To efficiently learn the multiscale structure observed in many real datasets, we introduce the Multiscale Learning by Unsupervised Nonlinear Diffusion (M-LUND) clustering algorithm, which is derived from a diffusion process at a range of temporal scales. We provide theoretical guarantees for the algorithm's performance and establish its computational efficiency. Finally, we show that the M-LUND clustering algorithm detects the latent structure in a range of synthetic and real datasets.

algorithm, dataset, diffusion distance, (13 more...)

arXiv.org Machine Learning

2102.005

Country:

North America > United States > California > Orange County > Irvine (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Medford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Using Recursive KMeans and Dijkstra Algorithm to Solve CVRP

Moussa, Hassan

arXiv.org Artificial IntelligenceJan-31-2021

Capacitated vehicle routing problem (CVRP) is being one of the most common optimization problems in our days, considering the wide usage of routing algorithms in multiple fields such as transportation domain, food delivery, network routing, ... Capacitated vehicle routing problem is classified as an NP-Hard problem, hence normal optimization algorithm canât solve it. In our paper, we discuss a new way to solve the mentioned problem, using a recursive approach of the most known clustering algorithm âK-Meansâ, one of the known shortest path algorithm âDijkstraâ, and some mathematical operations. In this paper, we will show how to implement those methods together in order to get the nearest solution of the optimal route, since research and development are still on go, this research paper may be extended with another one, that will involve the implementational results of this thoric side.

algorithm, node, vehicle, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.13140/RG.2.2.20970.85447

2102.00567

Genre: Research Report (0.40)

Industry: Transportation > Freight & Logistics Services (0.79)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Understanding Core Data Science Algorithms: K-Means and K-Medoids Clustering - DZone Big Data

#artificialintelligenceJan-29-2021, 20:31:16 GMT

Clustering is one of the major techniques used for statistical data analysis. As the term suggests, "clustering" is defined as the process of gathering similar objects into different groups or distribution of datasets into subsets with a defined distance measure. K-means clustering is touted as a foundational algorithm every data scientist ought to have in their toolbox. K-means and k-medoids are methods used in partitional clustering algorithms whose functionality works based on specifying an initial number of groups or, more precisely, iteratively by reallocation of objects among groups. The algorithm works by first segregating all the points into an already selected number of clusters.

algorithm, clustering, euclidean distance, (11 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Generative hypergraph clustering: from blockmodels to modularity

Chodrow, Philip S., Veldt, Nate, Benson, Austin R.

arXiv.org Machine LearningJan-27-2021

Hypergraphs are a natural modeling paradigm for a wide range of complex relational systems with multibody interactions. A standard analysis task is to identify clusters of closely related or densely interconnected nodes. While many probabilistic generative models for graph clustering have been proposed, there are relatively few such models for hypergraphs. We propose a Poisson degree-corrected hypergraph stochastic blockmodel (DCHSBM), an expressive generative model of clustered hypergraphs with heterogeneous node degrees and edge sizes. Approximate maximum-likelihood inference in the DCHSBM naturally leads to a clustering objective that generalizes the popular modularity objective for graphs. We derive a general Louvain-type algorithm for this objective, as well as a a faster, specialized "All-Or-Nothing" (AON) variant in which edges are expected to lie fully within clusters. This special case encompasses a recent proposal for modularity in hypergraphs, while also incorporating flexible resolution and edge-size parameters. We show that hypergraph Louvain is highly scalable, including as an example an experiment on a synthetic hypergraph of one million nodes. We also demonstrate through synthetic experiments that the detectability regimes for hypergraph community detection differ from methods based on dyadic graph projections. In particular, there are regimes in which hypergraph methods can recover planted partitions even though graph based methods necessarily fail due to information-theoretic limits. We use our model to analyze different patterns of higher-order structure in school contact networks, U.S. congressional bill cosponsorship, U.S. congressional committees, product categories in co-purchasing behavior, and hotel locations from web browsing sessions, that it is able to recover ground truth clusters in empirical data sets exhibiting the corresponding higher-order structure.

algorithm, hypergraph, node, (13 more...)

arXiv.org Machine Learning

2101.09611

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Africa > Senegal > Kolda Region > Kolda (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Education (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Pitfalls of Assessing Extracted Hierarchies for Multi-Class Classification

del Moral, Pablo, Nowaczyk, Slawomir, Sant'Anna, Anita, Pashami, Sepideh

arXiv.org Machine LearningJan-26-2021

Using hierarchies of classes is one of the standard methods to solve multi-class classification problems. In the literature, selecting the right hierarchy is considered to play a key role in improving classification performance. Although different methods have been proposed, there is still a lack of understanding of what makes one method to extract hierarchies perform better or worse. To this effect, we analyze and compare some of the most popular approaches to extracting hierarchies. We identify some common pitfalls that may lead practitioners to make misleading conclusions about their methods. In addition, to address some of these problems, we demonstrate that using random hierarchies is an appropriate benchmark to assess how the hierarchy's quality affects the classification performance. In particular, we show how the hierarchy's quality can become irrelevant depending on the experimental setup: when using powerful enough classifiers, the final performance is not affected by the quality of the hierarchy. We also show how comparing the effect of the hierarchies against non-hierarchical approaches might incorrectly indicate their superiority. Our results confirm that datasets with a high number of classes generally present complex structures in how these classes relate to each other. In these datasets, the right hierarchy can dramatically improve classification performance.

algorithm, classifier, hierarchy, (13 more...)

arXiv.org Machine Learning

2101.11095

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Sweden > Halland County > Halmstad (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Adaptive Neuro Fuzzy Networks based on Quantum Subtractive Clustering

Mousavi, Ali, Jalali, Mehrdad, Yaghoubi, Mahdi

arXiv.org Artificial IntelligenceJan-26-2021

Data mining techniques can be used to discover useful patterns by exploring and analyzing data and it's feasible to synergitically combine machine learning tools to discover fuzzy classification rules.In this paper, an adaptive Neuro fuzzy network with TSK fuzzy type and an improved quantum subtractive clustering has been developed. Quantum clustering (QC) is an intuition from quantum mechanics which uses Schrodinger potential and time-consuming gradient descent method. The principle advantage and shortcoming of QC is analyzed and based on its shortcomings, an improved algorithm through a subtractive clustering method is proposed. Cluster centers represent a general model with essential characteristics of data which can be use as premise part of fuzzy rules.The experimental results revealed that proposed Anfis based on quantum subtractive clustering yielded good approximation and generalization capabilities and impressive decrease in the number of fuzzy rules and network output accuracy in comparison with traditional methods.

cluster center, equation, fuzzy rule, (11 more...)

arXiv.org Artificial Intelligence

2102.0082

Country:

Asia > Middle East > Iran > Razavi Khorasan Province > Mashhad (0.05)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
North America > United States > Massachusetts (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback