AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Explaining Deep Neural Networks using Unsupervised Clustering

Liu, Yu-han, Arik, Sercan O.

arXiv.org Artificial IntelligenceJul-15-2020

We propose a novel method to explain trained deep neural networks (DNNs), by distilling them into surrogate models using unsupervised clustering. Our method can be applied flexibly to any subset of layers of a DNN architecture and can incorporate low-level and high-level information. On image datasets given pre-trained DNNs, we demonstrate the strength of our method in finding similar training samples, and shedding light on the concepts the DNNs base their decisions on. Via user studies, we show that our model can improve the user trust in model's prediction.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2007.07477

Country: North America > United States > New York (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Evaluating and Validating Cluster Results

Vysala, Anupriya, Gomes, Dr. Joseph

arXiv.org Machine LearningJul-15-2020

Clustering is the technique to partition data according to their characteristics. Data that are similar in nature belong to the same cluster [1]. There are two types of evaluation methods to evaluate clustering quality. One is an external evaluation where the truth labels in the data sets are known in advance and the other is internal evaluation in which the evaluation is done with data set itself without true labels. In this paper, both external evaluation and internal evaluation are performed on the cluster results of the IRIS dataset. In the case of external evaluation Homogeneity, Correctness and V-measure scores are calculated for the dataset. For internal performance measures, the Silhouette Index and Sum of Square Errors are used. These internal performance measures along with the dendrogram (graphical tool from hierarchical Clustering) are used first to validate the number of clusters. Finally, as a statistical tool, we used the frequency distribution method to compare and provide a visual representation of the distribution of observations within a clustering result and the original data.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Machine Learning

2007.08034

Country: North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Deep Representation Learning and Clustering of Traffic Scenarios

Harmening, Nick, Biloš, Marin, Günnemann, Stephan

arXiv.org Machine LearningJul-15-2020

Determining the traffic scenario space is a major challenge for the homologation and coverage assessment of automated driving functions. In contrast to current approaches that are mainly scenario-based and rely on expert knowledge, we introduce two data driven autoencoding models that learn a latent representation of traffic scenes. First is a CNN based spatio-temporal model that autoencodes a grid of traffic participants' positions. Secondly, we develop a pure temporal RNN based model that auto-encodes a sequence of sets. To handle the unordered set data, we had to incorporate the permutation invariance property. Finally, we show how the latent scenario embeddings can be used for clustering traffic scenarios and similarity retrieval.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Machine Learning

2007.0774

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
North America > United States > District of Columbia > Washington (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report (0.42)

Industry:

Automobiles & Trucks (0.49)
Transportation > Ground > Road (0.35)
Information Technology > Robotics & Automation (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)
Information Technology > Artificial Intelligence > Natural Language (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Clustering Uber Rideshare Data - KDnuggets

#artificialintelligenceJul-14-2020, 16:35:42 GMT

According to Gartner, by 2020, a quarter billion connected vehicles will form a major element of the Internet of Things. Connected vehicles are projected to generate 25GB of data per hour, which can be analyzed to provide real-time monitoring and apps, and will lead to new concepts of mobility and vehicle usage. Uber Technologies Inc is a peer-to-peer ride sharing platform. Uber platform connects the cab drivers who can drive to the customer location. Uber uses machine learning, from calculating pricing to finding the optimal positioning of cars to maximize profits.

artificial intelligence, centroid, machine learning, (10 more...)

#artificialintelligence

Country: North America > United States > New York > Richmond County > New York City (0.05)

Industry:

Transportation > Ground > Road (0.71)
Information Technology (0.71)
Transportation > Passenger (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.39)

Add feedback

Mixture Complexity and Its Application to Gradual Clustering Change Detection

Kyoya, Shunki, Yamanishi, Kenji

arXiv.org Machine LearningJul-14-2020

In model-based clustering using finite mixture models, it is a significant challenge to determine the number of clusters (cluster size). It used to be equal to the number of mixture components (mixture size); however, this may not be valid in the presence of overlaps or weight biases. In this study, we propose to continuously measure the cluster size in a mixture model by a new concept called mixture complexity (MC). It is formally defined from the viewpoint of information theory and can be seen as a natural extension of the cluster size considering overlap and weight bias. Subsequently, we apply MC to the issue of gradual clustering change detection. Conventionally, clustering changes has been considered to be abrupt, induced by the changes in the mixture size or cluster size. Meanwhile, we consider the clustering changes to be gradual in terms of MC; it has the benefits of finding the changes earlier and discerning the significant and insignificant changes. We further demonstrate that the MC can be decomposed according to the hierarchical structures of the mixture models; it helps us to analyze the detail of substructures.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2007.07467

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > New York (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(12 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

9 Key Machine Learning Algorithms Explained in Plain English

#artificialintelligenceJul-12-2020, 02:49:34 GMT

Machine learning [https://gum.co/pGjwd] is changing the world. Google uses machine learning to suggest search results to users. Netflix uses it to recommend movies for you to watch. Facebook uses machine learning to suggest people you may know. Machine learning has never been more important. At the same time, understanding machine learning is hard. The field is full of jargon. And the number of different ML algorithms grows each year. This article will introduce you to the fundamental concepts

algorithm, artificial intelligence, machine learning, (16 more...)

#artificialintelligence

Industry:

Media > Television (0.48)
Information Technology > Services (0.34)
Leisure & Entertainment > Sports (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.38)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Unsupervised Feature Selection for Tumor Profiles using Autoencoders and Kernel Methods

Palazzo, Martin, Beauseroy, Pierre, Yankilevich, Patricio

arXiv.org Machine LearningJul-12-2020

Molecular data from tumor profiles is high dimensional. Tumor profiles can be characterized by tens of thousands of gene expression features. Due to the size of the gene expression feature set machine learning methods are exposed to noisy variables and complexity. Tumor types present heterogeneity and can be subdivided in tumor subtypes. In many cases tumor data does not include tumor subtype labeling thus unsupervised learning methods are necessary for tumor subtype discovery. This work aims to learn meaningful and low dimensional representations of tumor samples and find tumor subtype clusters while keeping biological signatures without using tumor labels. The proposed method named Latent Kernel Feature Selection (LKFS) is an unsupervised approach for gene selection in tumor gene expression profiles. By using Autoencoders a low dimensional and denoised latent space is learned as a target representation to guide a Multiple Kernel Learning model that selects a subset of genes. By using the selected genes a clustering method is used to group samples. In order to evaluate the performance of the proposed unsupervised feature selection method the obtained features and clusters are analyzed by clinical significance. The proposed method has been applied on three tumor datasets which are Brain, Renal and Lung, each one composed by two tumor subtypes. When compared with benchmark unsupervised feature selection methods the results obtained by the proposed method reveal lower redundancy in the selected features and a better clustering performance.

artificial intelligence, machine learning, representation, (14 more...)

arXiv.org Machine Learning

2007.06106

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
Europe > France (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

Inverse Graph Identification: Can We Identify Node Labels Given Graph Labels?

Bian, Tian, Xiao, Xi, Xu, Tingyang, Rong, Yu, Huang, Wenbing, Zhao, Peilin, Huang, Junzhou

arXiv.org Machine LearningJul-12-2020

Graph Identification (GI) has long been researched in graph learning and is essential in certain applications (e.g. social community detection). Specifically, GI requires to predict the label/score of a target graph given its collection of node features and edge connections. While this task is common, more complex cases arise in practice---we are supposed to do the inverse thing by, for example, grouping similar users in a social network given the labels of different communities. This triggers an interesting thought: can we identify nodes given the labels of the graphs they belong to? Therefore, this paper defines a novel problem dubbed Inverse Graph Identification (IGI), as opposed to GI. Upon a formal discussion of the variants of IGI, we choose a particular case study of node clustering by making use of the graph labels and node features, with an assistance of a hierarchical graph that further characterizes the connections between different graphs. To address this task, we propose Gaussian Mixture Graph Convolutional Network (GMGCN), a simple yet effective method that makes the node-level message passing process using Graph Attention Network (GAT) under the protocol of GI and then infers the category of each node via a Gaussian Mixture Layer (GML). The training of GMGCN is further boosted by a proposed consensus loss to take advantage of the structure of the hierarchical graph. Extensive experiments are conducted to test the rationality of the formulation of IGI. We verify the superiority of the proposed method compared to other baselines on several benchmarks we have built up. We will release our codes along with the benchmark data to facilitate more research attention to the IGI problem.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2007.0597

Country: North America > United States > New York (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

AutoEmbedder: A semi-supervised DNN embedding system for clustering

Ohi, Abu Quwsar, Mridha, M. F., Safir, Farisa Benta, Hamid, Md. Abdul, Monowar, Muhammad Mostafa

arXiv.org Machine LearningJul-11-2020

Clustering is widely used in unsupervised learning method that deals with unlabeled data. Deep clustering has become a popular study area that relates clustering with Deep Neural Network (DNN) architecture. Deep clustering method downsamples high dimensional data, which may also relate clustering loss. Deep clustering is also introduced in semi-supervised learning (SSL). Most SSL methods depend on pairwise constraint information, which is a matrix containing knowledge if data pairs can be in the same cluster or not. This paper introduces a novel embedding system named AutoEmbedder, that downsamples higher dimensional data to clusterable embedding points. To the best of our knowledge, this is the first research endeavor that relates to traditional classifier DNN architecture with a pairwise loss reduction technique. The training process is semi-supervised and uses Siamese network architecture to compute pairwise constraint loss in the feature learning phase. The AutoEmbedder outperforms most of the existing DNN based semi-supervised methods tested on famous datasets.

artificial intelligence, autoembedder, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1016/j.knosys.2020.106190

2007.0583

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
Asia > Middle East > Saudi Arabia > Mecca Province > Jeddah (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Relation-Guided Representation Learning

Kang, Zhao, Lu, Xiao, Liang, Jian, Bai, Kun, Xu, Zenglin

arXiv.org Machine LearningJul-11-2020

Deep auto-encoders (DAEs) have achieved great success in learning data representations via the powerful representability of neural networks. But most DAEs only focus on the most dominant structures which are able to reconstruct the data from a latent space and neglect rich latent structural information. In this work, we propose a new representation learning method that explicitly models and leverages sample relations, which in turn is used as supervision to guide the representation learning. Different from previous work, our framework well preserves the relations between samples. Since the prediction of pairwise relations themselves is a fundamental problem, our model adaptively learns them from data. This provides much flexibility to encode real data manifold. The important role of relation and representation learning is evaluated on the clustering task. Extensive experiments on benchmark data sets demonstrate the superiority of our approach. By seeking to embed samples into subspace, we further show that our method can address the large-scale and out-of-sample problem.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Machine Learning

2007.05742

Country:

Asia > China > Sichuan Province (0.14)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Massachusetts (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback