AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Ultrametric Fitting by Gradient Descent

Chierchia, Giovanni, Perret, Benjamin

Neural Information Processing SystemsMar-18-2020, 21:46:50 GMT

We study the problem of fitting an ultrametric distance to a dissimilarity graph in the context of hierarchical cluster analysis. Standard hierarchical clustering methods are specified procedurally, rather than in terms of the cost function to be optimized. We aim to overcome this limitation by presenting a general optimization framework for ultrametric fitting. Our approach consists of modeling the latter as a constrained optimization problem over the continuous space of ultrametrics. So doing, we can leverage the simple, yet effective, idea of replacing the ultrametric constraint with a min-max operation injected directly into the cost function.

cost function, gradient descent, ultrametric fitting, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.44)

Add feedback

Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

Zhu, Hanlin, Li, Xue, Sun, Liuyang, He, Fei, Zhao, Zhengtuo, Luan, Lan, Tran, Ngoc Mai, Xie, Chong

arXiv.org Machine LearningMar-18-2020

Across many areas, from neural tracking to database entity resolution, manual assessment of clusters by human experts presents a bottleneck in rapid development of scalable and specialized clustering methods. To solve this problem we develop C-FAR, a novel method for Fast, Automated and Reproducible assessment of multiple hierarchical clustering algorithms simultaneously. Our algorithm takes any number of hierarchical clustering trees as input, then strategically queries pairs for human feedback, and outputs an optimal clustering among those nominated by these trees. While it is applicable to large dataset in any domain that utilizes pairwise comparisons for assessment, our flagship application is the cluster aggregation step in spike-sorting, the task of assigning waveforms (spikes) in recordings to neurons. On simulated data of 96 neurons under adverse conditions, including drifting and 25\% blackout, our algorithm produces near-perfect tracking relative to the ground truth. Our runtime scales linearly in the number of input trees, making it a competitive computational tool. These results indicate that C-FAR is highly suitable as a model selection and assessment tool in clustering tasks.

algorithm, automated and reproducible assessment, neuron, (14 more...)

arXiv.org Machine Learning

2003.08533

Country:

Europe > Italy (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.84)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Finding parking spots with Custom Vision and IoT

#artificialintelligenceMar-17-2020, 13:34:59 GMT

Machine learning problems can generally be divided into three types. Classification and regression, which are known as supervised learning, and unsupervised learning which in the context of machine learning applications often refers to clustering. Machine learning problems can generally be divided into three types. Classification and regression, which are known as supervised learning, and unsupervised learning which in the context of machine learning applications often refers to clustering. In the following article, I am going to give a brief introduction to each of these three problems and will include a walkthrough in the popular python library scikit-learn.

algorithm, dataset, learning, (14 more...)

#artificialintelligence

Industry:

Education > Focused Education > Special Education (0.46)
Transportation > Infrastructure & Services (0.40)
Transportation > Ground > Road (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.31)

Add feedback

An Automatic Attribute Based Access Control Policy Extraction from Access Logs

Karimi, Leila, Aldairi, Maryam, Joshi, James, Abdelhakim, Mai

arXiv.org Artificial IntelligenceMar-17-2020

With the rapid advances in computing and information technologies, traditional access control models have become inadequate in terms of capturing fine-grained, and expressive security requirements of newly emerging applications. An attribute-based access control (ABAC) model provides a more flexible approach for addressing the authorization needs of complex and dynamic systems. While organizations are interested in employing newer authorization models, migrating to such models pose as a significant challenge. Many large-scale businesses need to grant authorization to their user populations that are potentially distributed across disparate and heterogeneous computing environments. Each of these computing environments may have its own access control model. The manual development of a single policy framework for an entire organization is tedious, costly, and error-prone. In this paper, we present a methodology for automatically learning ABAC policy rules from access logs of a system to simplify the policy development process. The proposed approach employs an unsupervised learning-based algorithm for detecting patterns in access logs and extracting ABAC authorization rules from these patterns. In addition, we present two policy improvement algorithms, including rule pruning and policy refinement algorithms to generate a higher quality mined policy. Finally, we implement a prototype of the proposed approach to demonstrate its feasibility.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2003.0727

Country:

North America > United States > Michigan (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Commercial Services & Supplies > Security & Alarm Services (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
(3 more...)

Add feedback

Directionally Dependent Multi-View Clustering Using Copula Model

Afrin, Kahkashan, Iquebal, Ashif S., Karimi, Mostafa, Souris, Allyson, Lee, Se Yoon, Mallick, Bani K.

arXiv.org Machine LearningMar-16-2020

In recent biomedical scientific problems, it is a fundamental issue to integratively cluster a set of objects from multiple sources of datasets. Such problems are mostly encountered in genomics, where data is collected from various sources, and typically represent distinct yet complementary information. Integrating these data sources for multi-source clustering is challenging due to their complex dependence structure including directional dependency. Particularly in genomics studies, it is known that there is certain directional dependence between DNA expression, DNA methylation, and RNA expression, widely called The Central Dogma. Most of the existing multi-view clustering methods either assume an independent structure or pair-wise (non-directional) dependency, thereby ignoring the directional relationship. Motivated by this, we propose a copula-based multi-view clustering model where a copula enables the model to accommodate the directional dependence existing in the datasets. We conduct a simulation experiment where the simulated datasets exhibiting inherent directional dependence: it turns out that ignoring the directional dependence negatively affects the clustering performance. As a real application, we applied our model to the breast cancer tumor samples collected from The Cancer Genome Altas (TCGA).

dataset, dependence, directional dependence, (16 more...)

arXiv.org Machine Learning

2003.07494

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Unsupervised machine learning of quantum phase transitions using diffusion maps

Lidiak, Alexander, Gong, Zhexuan

arXiv.org Machine LearningMar-16-2020

Experimental quantum simulators have become large and complex enough that discovering new physics from the huge amount of measurement data can be quite challenging, especially when little theoretical understanding of the simulated model is available. Unsupervised machine learning methods are particularly promising in overcoming this challenge. For the specific task of learning quantum phase transitions, unsupervised machine learning methods have primarily been developed for phase transitions characterized by simple order parameters, typically linear in the measured observables. However, such methods often fail for more complicated phase transitions, such as those involving incommensurate phases, valence-bond solids, topological order, and many-body localization. We show that the diffusion map method, which performs nonlinear dimensionality reduction and spectral clustering of the measurement data, has significant potential for learning such complex phase transitions unsupervised. This method works for measurements of local observables in a single basis and is thus readily applicable to many experimental quantum simulators as a versatile tool for learning various quantum phases and phase transitions.

diffusion map, phase transition, transition, (15 more...)

arXiv.org Machine Learning

2003.07399

Country:

North America > United States > Colorado > Jefferson County > Golden (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.04)
Asia > Japan (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.51)

Add feedback

Characterising hot stellar systems with confidence

Chattopadhyay, Souradeep, Maitra, Ranjan

arXiv.org Machine LearningMar-16-2020

Hot stellar systems (HSS) are a collection of stars bound together by gravitational attraction. These systems hold clues to many mysteries of outer space so understanding their origin, evolution and physical properties is important but remains a huge challenge. We used multivariate $t$-mixtures model-based clustering to analyze 13456 hot stellar systems from Misgeld & Hilker (2011) that included 12763 candidate globular clusters and found eight homogeneous groups using the Bayesian Information Criterion (BIC). A nonparametric bootstrap procedure was used to estimate the confidence of each of our clustering assignments. The eight obtained groups can be characterized in terms of the correlation, mass, effective radius and surface density. Using conventional correlation-mass-effective radius-surface density notation, the largest group, Group 1, can be described as having positive-low-low-moderate characteristics. The other groups, numbered in decreasing sizes are similarly characterised, with Group 2 having positive-low-low-high characteristics, Group 3 displaying positive-low-low-moderate characteristics, Group 4 having positive-low-low-high characteristic, Group 5 displaying positive-low-moderate-moderate characteristic and Group 6 showing positive-moderate-low-high characteristic. The smallest group (Group 8) shows negative-low-moderate-moderate characteristic. Group 7 has no candidate clusters and so cannot be similarly labeled but the mass, effective radius correlation for these non-candidates indicates that they zare larger than typical globular clusters. Assertions drawn for each group are ambiguous for a few HSS having low confidence in classification. Our analysis identifies distinct kinds of HSS with varying confidence and provides novel insight into their physical and evolutionary properties.

classification, group 1, hss, (17 more...)

arXiv.org Machine Learning

2003.05777

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Towards automated kernel selection in machine learning systems: A SYCL case study

Lawson, John

arXiv.org Machine LearningMar-15-2020

Automated tuning of compute kernels is a popular area of research, mainly focused on finding optimal kernel parameters for a problem with fixed input sizes. This approach is good for deploying machine learning models, where the network topology is constant, but machine learning research often involves changing network topologies and hyperparameters. Traditional kernel auto-tuning has limited impact in this case; a more general selection of kernels is required for libraries to accelerate machine learning research. In this paper we present initial results using machine learning to select kernels in a case study deploying high performance SYCL kernels in libraries that target a range of heterogeneous devices from desktop GPUs to embedded accelerators. The techniques investigated apply more generally and could similarly be integrated with other heterogeneous programming systems. By combining auto-tuning and machine learning these kernel selection processes can be deployed with little developer effort to achieve high performance on new hardware.

artificial intelligence, configuration, machine learning, (19 more...)

arXiv.org Machine Learning

2003.06795

Country: North America > United States > New York > New York County > New York City (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

Hodari, Zack, Lai, Catherine, King, Simon

arXiv.org Machine LearningMar-14-2020

In English, prosody adds a broad range of information to segment sequences, from information structure (e.g. contrast) to stylistic variation (e.g. expression of emotion). However, when learning to control prosody in text-to-speech voices, it is not clear what exactly the control is modifying. Existing research on discrete representation learning for prosody has demonstrated high naturalness, but no analysis has been performed on what these representations capture, or if they can generate meaningfully-distinct variants of an utterance. We present a phrase-level variational autoencoder with a multi-modal prior, using the mode centres as "intonation codes". Our evaluation establishes which intonation codes are perceptually distinct, finding that the intonation codes from our multi-modal latent model were significantly more distinct than a baseline using k-means clustering. We carry out a follow-up qualitative study to determine what information the codes are carrying. Most commonly, listeners commented on the intonation codes having a statement or question style. However, many other affect-related styles were also reported, including: emotional, uncertain, surprised, sarcastic, passive aggressive, and upset.

intonation code, rendition, representation, (13 more...)

arXiv.org Machine Learning

2003.06686

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(6 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

DHOG: Deep Hierarchical Object Grouping

Darlow, Luke Nicholas, Storkey, Amos

arXiv.org Machine LearningMar-13-2020

Recently, a number of competitive methods have tackled unsupervised representation learning by maximising the mutual information between the representations produced from augmentations. The resulting representations are then invariant to stochastic augmentation strategies, and can be used for downstream tasks such as clustering or classification. Yet data augmentations preserve many properties of an image and so there is potential for a suboptimal choice of representation that relies on matching easy-to-find features in the data. We demonstrate that greedy or local methods of maximising mutual information (such as stochastic gradient optimisation) discover local optima of the mutual information criterion; the resulting representations are also less-ideally suited to complex downstream tasks. Earlier work has not specifically identified or addressed this issue. We introduce deep hierarchical object grouping (DHOG) that computes a number of distinct discrete representations of images in a hierarchical order, eventually generating representations that better optimise the mutual information objective. We also find that these representations align better with the downstream task of grouping into underlying object classes. We tested DHOG on unsupervised clustering, which is a natural downstream test as the target representation is a discrete labelling of the data. We achieved new state-of-the-art results on the three main benchmarks without any prefiltering or Sobel-edge detection that proved necessary for many previous methods to work. We obtain accuracy improvements of: 4.3% on CIFAR-10, 1.5% on CIFAR-100-20, and 7.2% on SVHN.

augmentation, dhog, representation, (15 more...)

arXiv.org Machine Learning

2003.08821

Country: Europe > United Kingdom (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback