Goto

Collaborating Authors

 Clustering


A Ubiquitous Unifying Degeneracy in Two-Body Microlensing Systems

arXiv.org Artificial Intelligence

While gravitational microlensing by planetary systems provides unique vistas on the properties of exoplanets, observations of a given 2-body microlensing event can often be interpreted with multiple distinct physical configurations. Such ambiguities are typically attributed to the close-wide and inner-outer types of degeneracies that arise from transformation invariances and symmetries of microlensing caustics. However, there remain unexplained inconsistencies between aforementioned theories and observations. Here, leveraging a fast machine learning inference framework, we present the discovery of the offset degeneracy, which concerns a magnification-matching behaviour on the lens-axis and is formulated independent of caustics. This offset degeneracy unifies the close-wide and inner-outer degeneracies, generalises to resonant topologies, and upon reanalysis, not only appears ubiquitous in previously published planetary events with 2-fold degenerate solutions, but also resolves prior inconsistencies. Our analysis demonstrates that degenerate caustics do not strictly result in degenerate magnifications and that the commonly invoked close-wide degeneracy essentially never arises in actual events. Moreover, it is shown that parameters in offset degenerate configurations are related by a simple expression. This suggests the existence of a deeper symmetry in the equations governing 2-body lenses than previously recognised.


Streaming Inference for Infinite Non-Stationary Clustering

arXiv.org Artificial Intelligence

Learning from a continuous stream of non-stationary data in an unsupervised manner is arguably one of the most common and most challenging settings facing intelligent agents. Here, we attack learning under all three conditions (unsupervised, streaming, non-stationary) in the context of clustering, also known as mixture modeling. We introduce a novel clustering algorithm that endows mixture models with the ability to create new clusters online, as demanded by the data, in a probabilistic, time-varying, and principled manner. To achieve this, we first define a novel stochastic process called the Dynamical Chinese Restaurant Process (Dynamical CRP), which is a non-exchangeable distribution over partitions of a set; next, we show that the Dynamical CRP provides a non-stationary prior over cluster assignments and yields an efficient streaming variational inference algorithm. We conclude with experiments showing that the Dynamical CRP can be applied on diverse synthetic and real data with Gaussian and non-Gaussian likelihoods.


Adversarially Learned Mixture Model

arXiv.org Machine Learning

The Adversarially Learned Mixture Model (AMM) is a generative model for unsupervised or semi-supervised data clustering. The AMM is the first adversarially optimized method to model the conditional dependence between inferred continuous and categorical latent variables. Experiments on the MNIST and SVHN datasets show that the AMM allows for semantic separation of complex data when little or no labeled data is available. The AMM achieves a state-of-the-art unsupervised clustering error rate of 2.86% on the MNIST dataset. A semi-supervised extension of the AMM yields competitive results on the SVHN dataset.


Machine Learning Algebraic Geometry for Physics

arXiv.org Machine Learning

The ubiquitous interrelations between algebraic geometry and physics has for centuries flourished fruitful phenomena in both fields. With connections made as far back as Archimedes whose work on conic sections aided development of concepts surrounding the motion under gravity, physical understanding has largely relied upon the mathematical tools available. In the modern era, these two fields are still heavily intertwined, with particular relevance in addressing one of the most significant problems of our time - quantising gravity. String theory as a candidate for this theory of everything, relies heavily on algebraic geometry constructions to define its spacetime and to interpret its matter. However, where new mathematical tools arise their implementation is not always simple.


Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space

arXiv.org Artificial Intelligence

We introduce an unsupervised clustering algorithm to improve training efficiency and accuracy in predicting energies using molecular-orbital-based machine learning (MOB-ML). This work determines clusters via the Gaussian mixture model (GMM) in an entirely automatic manner and simplifies an earlier supervised clustering approach [J. Chem. Theory Comput., 15, 6668 (2019)] by eliminating both the necessity for user-specified parameters and the training of an additional classifier. Unsupervised clustering results from GMM have the advantage of accurately reproducing chemically intuitive groupings of frontier molecular orbitals and having improved performance with an increasing number of training examples. The resulting clusters from supervised or unsupervised clustering is further combined with scalable Gaussian process regression (GPR) or linear regression (LR) to learn molecular energies accurately by generating a local regression model in each cluster. Among all four combinations of regressors and clustering methods, GMM combined with scalable exact Gaussian process regression (GMM/GPR) is the most efficient training protocol for MOB-ML. The numerical tests of molecular energy learning on thermalized datasets of drug-like molecules demonstrate the improved accuracy, transferability, and learning efficiency of GMM/GPR over not only other training protocols for MOB-ML, i.e., supervised regression-clustering combined with GPR(RC/GPR) and GPR without clustering. GMM/GPR also provide the best molecular energy predictions compared with the ones from literature on the same benchmark datasets. With a lower scaling, GMM/GPR has a 10.4-fold speedup in wall-clock training time compared with scalable exact GPR with a training size of 6500 QM7b-T molecules.


5 Clustering Methods in Machine Learning

#artificialintelligence

In the beginning, let's have some common terminologies overview, A cluster is a group of objects that lie under the same class, or in other words, objects with similar properties are grouped in one cluster, and dissimilar objects are collected in another cluster. And, clustering is the process of classifying objects into a number of groups wherein each group, objects are very similar to each other than those objects in other groups. Simply, segmenting groups with similar properties/behaviour and assign them into clusters. Being an important analysis method in machine learning, clustering is used for identifying patterns and structure in labelled and unlabelled datasets. Clustering is exploratory data analysis techniques that can identify subgroups in data such that data points in each same subgroup (cluster) are very similar to each other and data points in separate clusters have different characteristics.


A Greedy and Optimistic Approach to Clustering with a Specified Uncertainty of Covariates

arXiv.org Machine Learning

In this study, we examine a clustering problem in which the covariates of each individual element in a dataset are associated with an uncertainty specific to that element. More specifically, we consider a clustering approach in which a pre-processing applying a non-linear transformation to the covariates is used to capture the hidden data structure. To this end, we approximate the sets representing the propagated uncertainty for the pre-processed features empirically. To exploit the empirical uncertainty sets, we propose a greedy and optimistic clustering (GOC) algorithm that finds better feature candidates over such sets, yielding more condensed clusters. As an important application, we apply the GOC algorithm to synthetic datasets of the orbital properties of stars generated through our numerical simulation mimicking the formation process of the Milky Way. The GOC algorithm demonstrates an improved performance in finding sibling stars originating from the same dwarf galaxy. These realistic datasets have also been made publicly available.


Day 30: 60 days of Data Science and Machine Learning Series

#artificialintelligence

This article explains what data engineers are and what their varied tasks and duties are. Seaborn is a very prominent library used during Exploratory Data Analysis of any data science project you are working upon. At times, this cohort could feel overwhelming due to the sheer volume of material I would need to learn and practice.


A Simple Guide to Machine Learning Visualisations

#artificialintelligence

The Yellowbrick library also contains a set of visualisation tools for analysing clustering algorithms. A common way to evaluate the performance of clustering models is with an intercluster distance map. The intercluster distance map plots an embedding of each cluster centre and visualises both the distance between the clusters and the relative size of each cluster based on membership. We can turn the diabetes dataset into a clustering problem by only using the features (X). Before we cluster the data we can use the popular elbow method to find the optimal number of clusters.


Hierarchical Clustering: Explain It To Me Like I'm 10

#artificialintelligence

This is part numero tres of the Explaining Machine Learning Algorithms to a 10-Year Old series. If you read the two previous ones about XGBoost Regression and K-Means Clustering, then you know the drill. We have a scary-sounding algorithm, so let's strip it of its scary bits and understand the simple intuition behind it. In the same vein as K-Means Clustering, today we are going to talk about another popular clustering algorithm -- Hierarchical Clustering. Let's say a clothing store has collected the ages of 9 of its customers, labeled C1-C9, and the amount each of them spent at the store in the last month.