AITopics

doi: 10.1109/TKDE.2019.2954133

1911.10293

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

#artificialintelligenceNov-21-2019, 01:03:44 GMT

Clustering Metrics Better Than the Elbow Method - KDnuggets

Clustering is an important part of the machine learning pipeline for business or scientific enterprises utilizing data science. As the name suggests, it helps to identify congregations of closely related (by some measure of distance) data points in a blob of data, which, otherwise, would be difficult to make sense of. However, mostly, the process of clustering falls under the realm of unsupervised machine learning. And unsupervised ML is a messy business. There is no known answers or labels to guide the optimization process or measure our success against.

algorithm, cluster center, elbow method, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.80)

Qian, Weizhu, Lauri, Fabrice, Gechter, Franck

A Probabilistic Approach for Discovering Daily Human Mobility Patterns with Mobile Data

--Discovering human mobility patterns with geo-location data collected from smartphone users has been a hot research topic in recent years. In this paper, we attempt to discover daily mobile patterns based on GPS data. We view this problem from a probabilistic perspective in order to explore more information from the original GPS data compared to other conventional methods. A non-parameter Bayesian modeling method, Infinite Gaussian Mixture Model, is used to estimate the probability density for the daily mobility. Then, we use Kullback-Leibler divergence as the metrics to measure the similarity of different probability distributions. And combining Infinite Gaussian Mixture Model and Kullback-Leibler divergence, we derived an automatic clustering algorithm to discover mobility patterns for each individual user without setting the number of clusters in advance. In the experiments, the effectiveness of our method is validated on the real user data collected from different users. The results show that the IGMM-based algorithm outperforms the GMM-based algorithm. We also test our methods on the dataset with different lengths to discover the minimum data length for discovering mobility patterns. I NTRODUCTION S MARTPHONEdevices are equipped with multiple sensors that can record user behavior on the handsets. With the help of a large-scale smartphone usage data, researchers are able to study human behavior in the real world.

data mining, machine learning, trajectory, (18 more...)

1911.09355

Country:

Europe > France (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Agostinelli, Andrea, Arulkumaran, Kai, Sarrico, Marta, Richemond, Pierre, Bharath, Anil Anthony

Memory-Efficient Episodic Control Reinforcement Learning with Dynamic Online k-means

Recently, neuro-inspired episodic control (EC) methods have been developed to overcome the data-inefficiency of standard deep reinforcement learning approaches. Using non-/semi-parametric models to estimate the value function, they learn rapidly, retrieving cached values from similar past states. In realistic scenarios, with limited resources and noisy data, maintaining meaningful representations in memory is essential to speed up the learning and avoid catastrophic forgetting. Unfortunately, EC methods have a large space and time complexity. We investigate different solutions to these problems based on prioritising and ranking stored states, as well as online clustering techniques. We also propose a new dynamic online k-means algorithm that is both computationally-efficient and yields significantly better performance at smaller memory sizes; we validate this approach on classic reinforcement learning environments and Atari games.

machine learning, memory size, reinforcement learning, (13 more...)

1911.0956

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games > Computer Games (0.72)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Visual Tactile Fusion Object Clustering

Zhang, Tao, Cong, Yang, Sun, Gan, Wang, Qianqian, Ding, Zhenming

Object clustering, aiming at grouping similar objects into one cluster with an unsupervised strategy, has been extensivelystudied among various data-driven applications. However, most existing state-of-the-art object clustering methods (e.g., single-view or multi-view clustering methods) only explore visual information, while ignoring one of most important sensing modalities, i.e., tactile information which can help capture different object properties and further boost the performance of object clustering task. To effectively benefit both visual and tactile modalities for object clustering, in this paper, we propose a deep Auto-Encoder-like Non-negative Matrix Factorization framework for visual-tactile fusion clustering. Specifically, deep matrix factorization constrained by an under-complete Auto-Encoder-like architecture is employed to jointly learn hierarchical expression of visual-tactile fusion data, and preserve the local structure of data generating distribution of visual and tactile modalities. Meanwhile, a graph regularizer is introduced to capture the intrinsic relations of data samples within each modality. Furthermore, we propose a modality-level consensus regularizer to effectively align thevisual and tactile data in a common subspace in which the gap between visual and tactile data is mitigated. For the model optimization, we present an efficient alternating minimization strategy to solve our proposed model. Finally, we conduct extensive experiments on public datasets to verify the effectiveness of our framework.

information, modality, visual and tactile data, (15 more...)

1911.0943

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Indiana > Marion County > Indianapolis (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Large-scale Multi-view Subspace Clustering in Linear Time

Kang, Zhao, Zhou, Wangtao, Zhao, Zhitong, Shao, Junming, Han, Meng, Xu, Zenglin

A plethora of multi-view subspace clustering (MVSC) methods have been proposed over the past few years. Researchers manage to boost clustering accuracy from different points of view. However, many state-of-the-art MVSC algorithms, typically have a quadratic or even cubic complexity, are inefficient and inherently difficult to apply at large scales. In the era of big data, the computational issue becomes critical. To fill this gap, we propose a large-scale MVSC (LMVSC) algorithm with linear order complexity. Inspired by the idea of anchor graph, we first learn a smaller graph for each view. Then, a novel approach is designed to integrate those graphs so that we can implement spectral clustering on a smaller graph. Interestingly, it turns out that our model also applies to single-view scenario.

multi-view subspace, sscomp, subspace, (14 more...)

1911.0929

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Saha, Jayasree, Mukherjee, Jayanta

CNAK : Cluster Number Assisted K-means

arXiv.org Machine LearningNov-20-2019

Determining the number of clusters present in a dataset is an important problem in cluster analysis. Conventional clustering techniques generally assume this parameter to be provided up front. In this paper, we propose a method which analyzes cluster stability for predicting the cluster number. Under the same computational framework, the technique also finds representatives of the clusters. The method is apt for handling big data, as we design the algorithm using Monte-Carlo simulation. Also, we explore a few pertinent issues found to be of also clustering. Experiments reveal that the proposed method is capable of identifying a single cluster. It is robust in handling high dimensional dataset and performs reasonably well over datasets having cluster imbalance. Moreover, it can indicate cluster hierarchy, if present. Overall we have observed significant improvement in speed and quality for predicting cluster numbers as well as the composition of clusters in a large dataset. Keywords: k-means clustering, Bipartite graph, Perfect Matching, Kuhn-Munkres Algorithm, Monte Carlo simulation. 1. Introduction In cluster analysis, it is required to group a set of data points in a multidimensional space, so that data points in the same group are more similar to each other than to those in other groups. These groups are called clusters. Various distance functions may be used to compute the degree of similarity or dissimilarity among these data points. Typically Euclidean distance function is widely used in clustering. The aim of this unsupervised technique is to increase homogeneity in a group and heterogeneity between groups. Several clustering methods with different characteristics have been proposed for different purposes. Some well-known methods include partition-based clustering [26], hierarchical clustering [25], spectral clustering [27], density-based clustering [12]. However, they require the knowledge of cluster number for a given dataset a priori [12, 21, 26, 27, 36].

cluster number, dataset, simulation, (13 more...)

1911.08871

Country:

Europe > Finland > North Karelia > Joensuu (0.04)
Asia > Middle East > Jordan (0.04)
Asia > India > West Bengal > Kharagpur (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

#artificialintelligenceNov-19-2019, 15:11:33 GMT

How to Train a Machine Learning Model in JASP: Clustering - JASP - Free and User-Friendly Statistical Software

This is a continuation of our series on machine learning methods that have been implemented in JASP (version 0.11 onwards). In this blog post we train a machine learning model to find clusters within our data set. The goal of a clustering task is to detect structures in the data. To do so, the algorithm needs to (1) identify the number of structures/groups in the data, and (2) figure out how the features are distributed in each group. For instance, clustering can be used to detect subgenres in electronic music, subgroups in a customer database, or to identify areas where there are greater incidences of particular types of crime.

algorithm, centroid, clustering, (13 more...)

#artificialintelligence

Country:

North America > United States > Indiana > Hamilton County > Fishers (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.72)

arXiv.org Machine LearningNov-19-2019

Gromov-Wasserstein Factorization Models for Graph Clustering

Xu, Hongteng

We propose a new nonlinear factorization model for graphs that are with topological structures, and optionally, node attributes. This model is based on a pseudometric called Gromov-Wasserstein (GW) discrepancy, which compares graphs in a relational way. It estimates observed graphs as GW barycenters constructed by a set of atoms with different weights. By minimizing the GW discrepancy between each observed graph and its GW barycenter-based estimation, we learn the atoms and their weights associated with the observed graphs. The model achieves a novel and flexible factorization mechanism under GW discrepancy, in which both the observed graphs and the learnable atoms can be un-aligned and with different sizes. We design an effective approximate algorithm for learning this Gromov-Wasserstein factorization (GWF) model, unrolling loopy computations as stacked modules and computing gradients with backpropaga-tion. The stacked modules can be with two different architectures, which correspond to the proximal point algorithm (PP A) and Bregman alternating direction method of multipliers (BADMM), respectively. Experiments show that our model obtains encouraging results on clustering graphs. Introduction As an important methodology for machine learning, factorization models explore intrinsic structures of high-dimensional observations explicitly, which have been widely used in many learning tasks, e.g., data clustering (Ng, Jordan, and Weiss 2002), dimensionality reduction (Cand es et al. 2011), recommendation systems (Wang and Blei 2011), etc. In particular, factorization models decompose high-dimensional observations into a set of atoms under specific criteria and achieve their latent representations accordingly.

graph, gw discrepancy, module, (15 more...)

1911.0853

Country:

Asia > Middle East > Jordan (0.24)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningNov-19-2019

Deep Unsupervised Clustering with Clustered Generator Model

Zhu, Dandan, Han, Tian, Zhou, Linqi, Yang, Xiaokang, Wu, Ying Nian

However, unsupervised clustering remains one of the most fundamental challenges in machine learning because of high dimensionality of data and high complexities of their hidden structures. Long-established approaches for unsupervised clustering including K-means [15] and Gaussian Mixture Model (GMM) [3] are still the building blocks for numerous applications due to their efficiency and simplicity. However, their distance metrics are limited to data space, making them ineffective for high-dimensional data such as images. Therefore, considerable efforts have been put into obtaining a good feature embedding of data, usually of low dimensionality, for effective clustering [37]. However, the representation obtained by standalone data embedding typically can-Tian Han is the corresponding author not capture the latent structure and variation of the observed data which may be ineffective for clustering. We believe the good representation for clustering should also be able to compactly represent the observed data distribution to encode all necessary characteristics of the observation. Deep generative models (a.k.a the generator models) have shown great promise in learning latent representations for high-dimensional signals such as images and videos [32, 24, 11]. Generator models parameterized by deep neural networks specify a nonlinear mapping from latent variables to observed data.

dataset, generator model, representation, (15 more...)

1911.08459

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Germany > Brandenburg > Potsdam (0.06)
Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)