AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Face clustering with Python - PyImageSearch

#artificialintelligenceJul-13-2018, 04:06:39 GMT

Today's blog post is inspired by a question from PyImageSearch reader, Leonard Bogdonoff. Hey Adrian, can you go into identity clustering? I have a dataset of photos and I can't seem to pinpoint how I would process them to identify the unique people. Such an application of "face clustering" or "identity clustering" could be used to aid to law enforcement. Consider a scenario where two perpetrators rob a bank in a busy city such as Boston or New York.

artificial intelligence, dataset, machine learning, (14 more...)

#artificialintelligence

Country: North America > United States > New York (0.24)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

On Ternary Coding and Three-Valued Logic

Kak, Subhash

arXiv.org Artificial IntelligenceJul-13-2018

Mathematically, ternary coding is more efficient than binary coding. It is little used in computation because technology for binary processing is already established and the implementation of ternary coding is more complicated, but remains relevant in algorithms that use decision trees and in communications. In this paper we present a new comparison of binary and ternary coding and their relative efficiencies are computed both for number representation and decision trees. The implications of our inability to use optimal representation through mathematics or logic are examined. Apart from considerations of representation efficiency, ternary coding appears preferable to binary coding in classification of many real-world problems of artificial intelligence (AI) and medicine. We examine the problem of identifying appropriate three classes for domain-specific applications. Keywords: optimal coding, decision trees, ternary logic, artificial intelligence Introduction The problem of optimal coding of numbers has been examined by many scholars (e.g.

artificial intelligence, machine learning, representation, (14 more...)

arXiv.org Artificial Intelligence

1807.06419

Country:

Asia > India (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Amplifying state dissimilarity leads to robust and interpretable clustering of scientific data

Husic, Brooke E., Schlueter-Kuck, Kristy L., Dabiri, John O.

arXiv.org Machine LearningJul-12-2018

Existing methods that aim to automatically cluster data into physically meaningful subsets typically require assumptions regarding the number, size, or shape of the coherent subgroups. We present a new method, simultaneous Coherent Structure Coloring (sCSC), which accomplishes the task of unsupervised clustering without a priori guidance regarding the underlying structure of the data. To illustrate the versatility of the method, we apply it to frontier physics problems at vastly different temporal and spatial scales: in a theoretical model of geophysical fluid dynamics, in laboratory measurements of vortex ring formation and entrainment, and in atomistic simulation of the Protein G system. The theoretical flow involves sparse sampling of non-equilibrium dynamics, where this new technique can find and characterize the structures that govern fluid transport using two orders of magnitude less data than required by existing methods. Application of the method to empirical measurements of vortex formation leads to the discovery of a well defined region in which vortex ring entrainment occurs, with potential implications ranging from flow control to cardiovascular diagnostics. Finally, the protein folding example demonstrates a data-rich application governed by equilibrium dynamics, where the technique in this manuscript automatically discovers the hierarchy of distinct processes that govern protein folding and clusters protein configurations accordingly. We anticipate straightforward translation to many other fields where existing analysis tools, such as k-means and traditional hierarchical clustering, require ad hoc assumptions on the data structure or lack the interpretability of the present method. The method is also potentially generalizable to fields where the underlying processes are less accessible, such as genomics and neuroscience.

dendrogram, upstream oil & gas, vascular disease, (24 more...)

arXiv.org Machine Learning

1807.04427

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Decentralized Clustering on Compressed Data without Prior Knowledge of the Number of Clusters

Dupraz, Elsa, Pastor, Dominique, Socheleau, François-Xavier

arXiv.org Machine LearningJul-12-2018

In sensor networks, it is not always practical to set up a fusion center. Therefore, there is need for fully decentralized clustering algorithms. Decentralized clustering algorithms should minimize the amount of data exchanged between sensors in order to reduce sensor energy consumption. In this respect, we propose one centralized and one decentralized clustering algorithm that work on compressed data without prior knowledge of the number of clusters. In the standard K-means clustering algorithm, the number of clusters is estimated by repeating the algorithm several times, which dramatically increases the amount of exchanged data, while our algorithm can estimate this number in one run. The proposed clustering algorithms derive from a theoretical framework establishing that, under asymptotic conditions, the cluster centroids are the only fixed-point of a cost function we introduce. This cost function depends on a weight function which we choose as the p-value of a Wald hypothesis test. This p-value measures the plausibility that a given measurement vector belongs to a given cluster. Experimental results show that our two algorithms are competitive in terms of clustering performance with respect to K-means and DB-Scan, while lowering by a factor at least $2$ the amount of data exchanged between sensors.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1807.04566

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Europe > France > Brittany > Finistère > Brest (0.04)

Genre: Research Report > Experimental Study (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Learning Neural Models for End-to-End Clustering

Meier, Benjamin Bruno, Elezi, Ismail, Amirian, Mohammadreza, Durr, Oliver, Stadelmann, Thilo

arXiv.org Machine LearningJul-11-2018

We propose a novel end-to-end neural network architecture that, once trained, directly outputs a probabilistic clustering of a batch of input examples in one pass. It estimates a distribution over the number of clusters $k$, and for each $1 \leq k \leq k_\mathrm{max}$, a distribution over the individual cluster assignment for each data point. The network is trained in advance in a supervised fashion on separate data to learn grouping by any perceptual similarity criterion based on pairwise labels (same/different group). It can then be applied to different data containing different groups. We demonstrate promising performance on high-dimensional data like images (COIL-100) and speech (TIMIT). We call this ``learning to cluster'' and show its conceptual difference to deep metric learning, semi-supervise clustering and other related approaches while having the advantage of performing learnable clustering fully end-to-end.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

1807.04001

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Germany (0.04)
Europe > Italy > Veneto > Venice (0.04)
(2 more...)

Genre:

Overview (0.68)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Recurrent Auto-Encoder Model for Large-Scale Industrial Sensor Signal Analysis

Wong, Timothy, Luo, Zhiyuan

arXiv.org Machine LearningJul-10-2018

Recurrent auto-encoder model summarises sequential data through an encoder structure into a fixed-length vector and then reconstructs the original sequence through the decoder structure. The summarised vector can be used to represent time series features. In this paper, we propose relaxing the dimensionality of the decoder output so that it performs partial reconstruction. The fixed-length vector therefore represents features in the selected dimensions only. In addition, we propose using rolling fixed window approach to generate training samples from unbounded time series data. The change of time series features over time can be summarised as a smooth trajectory path. The fixed-length vectors are further analysed using additional visualisation and unsupervised clustering techniques. The proposed method can be applied in large-scale industrial processes for sensors signal analysis purpose, where clusters of the vector representations can reflect the operating states of the industrial system.

artificial intelligence, neural network, recurrent auto-encoder model, (18 more...)

arXiv.org Machine Learning

1807.0371

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Automatic trajectory recognition in Active Target Time Projection Chambers data by means of hierarchical clustering

Dalitz, Christoph, Ayyad, Yassid, Wilberg, Jens, Aymans, Lukas, Bazin, Daniel, Mittig, Wolfgang

arXiv.org Machine LearningJul-10-2018

The automatic reconstruction of three-dimensional particle tracks from Active Target Time Projection Chambers data can be a challenging task, especially in the presence of noise. In this article, we propose a nonparametric algorithm that is based on the idea of clustering point triplets instead of the original points. We define an appropriate distance measure on point triplets and then apply a single-link hierarchical clustering on the triplets. Compared to parametric approaches like RANSAC or the Hough transform, the new algorithm has the advantage of potentially finding trajectories even of shapes that are not known beforehand. This feature is particularly important in low-energy nuclear physics experiments with AT operating inside a magnetic field. The algorithm has been validated using data from experiments performed with the Active Target Time Projection Chamber (AT-TPC) at the National Superconducting Cyclotron Laboratory (NSCL).The results demonstrate the capability of the algorithm to identify and isolate particle tracks that describe non-analytical trajectories. For curved tracks, the vertex detection recall was 86% and the precision 94%. For straight tracks, the vertex detection recall was 96% and the precision 98%. In the case of a test set containing only straight linear tracks, the algorithm performed better than an iterative Hough transform. Keywords: Time Projection Chambers, Active Target, Pattern Recognition, Clustering 1. Introduction One of the present aims of modern low-energy nuclear physics is to provide a more complete understanding about the behavior of subatomic matter under large isospin (i.e.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1807.03513

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Texas > Brooks County (0.04)
North America > United States > Michigan > Ingham County > Lansing (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Energy (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

Symbol Emergence in Cognitive Developmental Systems: a Survey

Taniguchi, Tadahiro, Ugur, Emre, Hoffmann, Matej, Jamone, Lorenzo, Nagai, Takayuki, Rosman, Benjamin, Matsuka, Toshihiko, Iwahashi, Naoto, Oztop, Erhan, Piater, Justus, Wörgötter, Florentin

arXiv.org Artificial IntelligenceJul-10-2018

Humans use signs, e.g., sentences in a spoken language, for communication and thought. Hence, symbol systems like language are crucial for our communication with other agents and adaptation to our real-world environment. The symbol systems we use in our human society adaptively and dynamically change over time. In the context of artificial intelligence (AI) and cognitive systems, the symbol grounding problem has been regarded as one of the central problems related to {\it symbols}. However, the symbol grounding problem was originally posed to connect symbolic AI and sensorimotor information and did not consider many interdisciplinary phenomena in human communication and dynamic symbol systems in our society, which semiotics considered. In this paper, we focus on the symbol emergence problem, addressing not only cognitive dynamics but also the dynamics of symbol systems in society, rather than the symbol grounding problem. We first introduce the notion of a symbol in semiotics from the humanities, to leave the very narrow idea of symbols in symbolic AI. Furthermore, over the years, it became more and more clear that symbol emergence has to be regarded as a multifaceted problem. Therefore, secondly, we review the history of the symbol emergence problem in different fields, including both biological and artificial systems, showing their mutual relations. We summarize the discussion and provide an integrative viewpoint and comprehensive overview of symbol emergence in cognitive systems. Additionally, we describe the challenges facing the creation of cognitive systems that can be part of symbol emergence systems.

machine learning, reinforcement learning, symbol system, (19 more...)

arXiv.org Artificial Intelligence

1801.08829

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
(26 more...)

Genre:

Research Report (1.00)
Overview (1.00)
Personal (0.92)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(6 more...)

Add feedback

The 5 Clustering Algorithms Data Scientists Need to Know

#artificialintelligenceJul-9-2018, 00:25:51 GMT

Clustering is a Machine Learning technique that involves the grouping of data points. Given a set of data points, we can use a clustering algorithm to classify each data point into a specific group. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. Clustering is a method of unsupervised learning and is a common technique for statistical data analysis used in many fields. In Data Science, we can use clustering analysis to gain some valuable insights from our data by seeing what groups the data points fall into when we apply a clustering algorithm.

algorithm, artificial intelligence, machine learning, (14 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Pairwise Covariates-adjusted Block Model for Community Detection

Huang, Sihan, Feng, Yang

arXiv.org Machine LearningJul-9-2018

One of the most fundamental problems in network study is community detection. The stochastic block model (SBM) is one widely used model for network data with different estimation methods developed with their community detection consistency results unveiled. However, the SBM is restricted by the strong assumption that all nodes in the same community are stochastically equivalent, which may not be suitable for practical applications. We introduce pairwise covariates-adjusted stochastic block model (PCABM), a generalization of SBM that incorporates pairwise covariate information. We study the maximum likelihood estimates of the coefficients for the covariates as well as the community assignments. It is shown that both the coefficient estimates of the covariates and the community assignments are consistent under suitable sparsity conditions. Spectral clustering with adjustment (SCWA) is introduced to efficiently solve PCABM. Under certain conditions, we derive the error bound of community estimation under SCWA and show that it is community detection consistent. PCABM compares favorably with the SBM or degree-corrected stochastic block model (DCBM) under a wide range of simulated and real networks when covariate information is accessible.

covariate, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1807.03469

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Greenland (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.50)

Industry: Education > Educational Setting (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
(2 more...)

Add feedback