AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

A Ubiquitous Unifying Degeneracy in Two-Body Microlensing Systems

Zhang, Keming, Gaudi, B. Scott, Bloom, Joshua S.

arXiv.org Artificial IntelligenceMay-10-2022

While gravitational microlensing by planetary systems provides unique vistas on the properties of exoplanets, observations of a given 2-body microlensing event can often be interpreted with multiple distinct physical configurations. Such ambiguities are typically attributed to the close-wide and inner-outer types of degeneracies that arise from transformation invariances and symmetries of microlensing caustics. However, there remain unexplained inconsistencies between aforementioned theories and observations. Here, leveraging a fast machine learning inference framework, we present the discovery of the offset degeneracy, which concerns a magnification-matching behaviour on the lens-axis and is formulated independent of caustics. This offset degeneracy unifies the close-wide and inner-outer degeneracies, generalises to resonant topologies, and upon reanalysis, not only appears ubiquitous in previously published planetary events with 2-fold degenerate solutions, but also resolves prior inconsistencies. Our analysis demonstrates that degenerate caustics do not strictly result in degenerate magnifications and that the commonly invoked close-wide degeneracy essentially never arises in actual events. Moreover, it is shown that parameters in offset degenerate configurations are related by a simple expression. This suggests the existence of a deeper symmetry in the equations governing 2-body lenses than previously recognised.

close-wide degeneracy, degeneracy, trajectory, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41550-022-01671-6

2111.13696

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Ohio > Franklin County > Columbus (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Streaming Inference for Infinite Non-Stationary Clustering

Schaeffer, Rylan, Liu, Gabrielle Kaili-May, Du, Yilun, Linderman, Scott, Fiete, Ila Rani

arXiv.org Artificial IntelligenceMay-2-2022

Learning from a continuous stream of non-stationary data in an unsupervised manner is arguably one of the most common and most challenging settings facing intelligent agents. Here, we attack learning under all three conditions (unsupervised, streaming, non-stationary) in the context of clustering, also known as mixture modeling. We introduce a novel clustering algorithm that endows mixture models with the ability to create new clusters online, as demanded by the data, in a probabilistic, time-varying, and principled manner. To achieve this, we first define a novel stochastic process called the Dynamical Chinese Restaurant Process (Dynamical CRP), which is a non-exchangeable distribution over partitions of a set; next, we show that the Dynamical CRP provides a non-stationary prior over cluster assignments and yields an efficient streaming variational inference algorithm. We conclude with experiments showing that the Dynamical CRP can be applied on diverse synthetic and real data with Gaussian and non-Gaussian likelihoods.

artificial intelligence, dynamical crp, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2205.01212

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Alameda County > Oakland (0.04)

Genre: Research Report (0.50)

Industry: Consumer Products & Services > Restaurants (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Add feedback

Adversarially Learned Mixture Model

Jesson, Andrew, Low-Kam, Cécile, Nair, Tanya, Soudan, Florian, Chandelier, Florent, Chapados, Nicolas

arXiv.org Machine LearningApr-23-2022

The Adversarially Learned Mixture Model (AMM) is a generative model for unsupervised or semi-supervised data clustering. The AMM is the first adversarially optimized method to model the conditional dependence between inferred continuous and categorical latent variables. Experiments on the MNIST and SVHN datasets show that the AMM allows for semantic separation of complex data when little or no labeled data is available. The AMM achieves a state-of-the-art unsupervised clustering error rate of 2.86% on the MNIST dataset. A semi-supervised extension of the AMM yields competitive results on the SVHN dataset.

artificial intelligence, leak 0, machine learning, (16 more...)

arXiv.org Machine Learning

1807.05344

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

Machine Learning Algebraic Geometry for Physics

Bao, Jiakang, He, Yang-Hui, Heyes, Elli, Hirst, Edward

arXiv.org Machine LearningApr-21-2022

The ubiquitous interrelations between algebraic geometry and physics has for centuries flourished fruitful phenomena in both fields. With connections made as far back as Archimedes whose work on conic sections aided development of concepts surrounding the motion under gravity, physical understanding has largely relied upon the mathematical tools available. In the modern era, these two fields are still heavily intertwined, with particular relevance in addressing one of the most significant problems of our time - quantising gravity. String theory as a candidate for this theory of everything, relies heavily on algebraic geometry constructions to define its spacetime and to interpret its matter. However, where new mathematical tools arise their implementation is not always simple.

artificial intelligence, arxiv, machine learning, (16 more...)

arXiv.org Machine Learning

2204.10334

Country:

Europe > Jersey (0.14)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
(4 more...)

Genre: Research Report > New Finding (0.92)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space

Cheng, Lixue, Sun, Jiace, Miller, Thomas F. III

arXiv.org Artificial IntelligenceApr-20-2022

We introduce an unsupervised clustering algorithm to improve training efficiency and accuracy in predicting energies using molecular-orbital-based machine learning (MOB-ML). This work determines clusters via the Gaussian mixture model (GMM) in an entirely automatic manner and simplifies an earlier supervised clustering approach [J. Chem. Theory Comput., 15, 6668 (2019)] by eliminating both the necessity for user-specified parameters and the training of an additional classifier. Unsupervised clustering results from GMM have the advantage of accurately reproducing chemically intuitive groupings of frontier molecular orbitals and having improved performance with an increasing number of training examples. The resulting clusters from supervised or unsupervised clustering is further combined with scalable Gaussian process regression (GPR) or linear regression (LR) to learn molecular energies accurately by generating a local regression model in each cluster. Among all four combinations of regressors and clustering methods, GMM combined with scalable exact Gaussian process regression (GMM/GPR) is the most efficient training protocol for MOB-ML. The numerical tests of molecular energy learning on thermalized datasets of drug-like molecules demonstrate the improved accuracy, transferability, and learning efficiency of GMM/GPR over not only other training protocols for MOB-ML, i.e., supervised regression-clustering combined with GPR(RC/GPR) and GPR without clustering. GMM/GPR also provide the best molecular energy predictions compared with the ones from literature on the same benchmark datasets. With a lower scaling, GMM/GPR has a 10.4-fold speedup in wall-clock training time compared with scalable exact GPR with a training size of 6500 QM7b-T molecules.

artificial intelligence, machine learning, molecule, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1021/acs.jctc.2c00396

2204.09831

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

5 Clustering Methods in Machine Learning

#artificialintelligenceApr-18-2022, 16:06:02 GMT

In the beginning, let's have some common terminologies overview, A cluster is a group of objects that lie under the same class, or in other words, objects with similar properties are grouped in one cluster, and dissimilar objects are collected in another cluster. And, clustering is the process of classifying objects into a number of groups wherein each group, objects are very similar to each other than those objects in other groups. Simply, segmenting groups with similar properties/behaviour and assign them into clusters. Being an important analysis method in machine learning, clustering is used for identifying patterns and structure in labelled and unlabelled datasets. Clustering is exploratory data analysis techniques that can identify subgroups in data such that data points in each same subgroup (cluster) are very similar to each other and data points in separate clusters have different characteristics.

algorithm, customer, dataset, (16 more...)

#artificialintelligence

Industry:

Health & Medicine (0.48)
Information Technology > Services (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A Greedy and Optimistic Approach to Clustering with a Specified Uncertainty of Covariates

Okuno, Akifumi, Hattori, Kohei

arXiv.org Machine LearningApr-18-2022

In this study, we examine a clustering problem in which the covariates of each individual element in a dataset are associated with an uncertainty specific to that element. More specifically, we consider a clustering approach in which a pre-processing applying a non-linear transformation to the covariates is used to capture the hidden data structure. To this end, we approximate the sets representing the propagated uncertainty for the pre-processed features empirically. To exploit the empirical uncertainty sets, we propose a greedy and optimistic clustering (GOC) algorithm that finds better feature candidates over such sets, yielding more condensed clusters. As an important application, we apply the GOC algorithm to synthetic datasets of the orbital properties of stars generated through our numerical simulation mimicking the formation process of the Milky Way. The GOC algorithm demonstrates an improved performance in finding sibling stars originating from the same dwarf galaxy. These realistic datasets have also been made publicly available.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2204.08205

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Day 30: 60 days of Data Science and Machine Learning Series

#artificialintelligenceApr-4-2022, 15:09:26 GMT

This article explains what data engineers are and what their varied tasks and duties are. Seaborn is a very prominent library used during Exploratory Data Analysis of any data science project you are working upon. At times, this cohort could feel overwhelming due to the sheer volume of material I would need to learn and practice.

data science, day 30, science and machine learning series

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.37)

Add feedback

A Simple Guide to Machine Learning Visualisations

#artificialintelligenceApr-4-2022, 06:01:12 GMT

The Yellowbrick library also contains a set of visualisation tools for analysing clustering algorithms. A common way to evaluate the performance of clustering models is with an intercluster distance map. The intercluster distance map plots an embedding of each cluster centre and visualises both the distance between the clusters and the relative size of each cluster based on membership. We can turn the diabetes dataset into a clustering problem by only using the features (X). Before we cluster the data we can use the popular elbow method to find the optimal number of clusters.

diabetes dataset, machine learning visualisation, simple guide

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Hierarchical Clustering: Explain It To Me Like I'm 10

#artificialintelligenceMar-31-2022, 17:38:47 GMT

This is part numero tres of the Explaining Machine Learning Algorithms to a 10-Year Old series. If you read the two previous ones about XGBoost Regression and K-Means Clustering, then you know the drill. We have a scary-sounding algorithm, so let's strip it of its scary bits and understand the simple intuition behind it. In the same vein as K-Means Clustering, today we are going to talk about another popular clustering algorithm -- Hierarchical Clustering. Let's say a clothing store has collected the ages of 9 of its customers, labeled C1-C9, and the amount each of them spent at the store in the last month.

clustering, customer, hierarchical clustering, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback