AITopics

1910.08644

Country:

Europe > United Kingdom (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Golshanrad, Pouria, Mahini, Hamid, Bahrak, Behnam

Predicting passenger origin-destination in online taxi-hailing systems

Because of transportation planning, traffic management and dispatch optimization importance, the passenger origin-destination prediction has become one of the most important requirements for intelligent transportation systems management. In this paper, we propose a model to predict the origin and destination of travels which will occur in the next specified time window. In order to extract meaningful travel flows we use K-means clustering in four-dimensional space with maximum cluster size limitation for origin and destination. Because of large number of clusters, we use non-negative matrix factorization to decrease the number of travel clusters. We also use a stacked recurrent neural network model to predict travels count in each cluster. Comparing our results with other existing models show that our proposed model has 5-7% lower mean absolute percentage error (MAPE) for 1-hour time window, and 14% lower MAPE for 30-minute time window.

algorithm, passenger origin-destination, vector, (16 more...)

1910.08145

Country:

North America > United States (0.14)
Asia > Middle East > Iran > Tehran Province > Tehran (0.05)

Genre: Research Report (0.83)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (0.88)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Gao, Boyan, Yang, Yongxin, Gouk, Henry, Hospedales, Timothy M.

Deep clustering with concrete k-means

ABSTRACT W e address the problem of simultaneously learning a k -means clustering and deep feature representation from unlabelle d data, which is of interest due to the potential of deep k -means to outperform traditional two-step feature extraction and shallow-clustering strategies. W e achieve this by develop ing a gradient-estimator for the non-differentiable k -means objective via the Gumbel-Softmax reparameterisation trick. In contrast to previous attempts at deep clustering, our concr ete k -means model can be optimised with respect to the canonical k -means objective and is easily trained end-to-end without resorting to alternating optimisation. W e demonstrate the efficacy of our method on standard clustering benchmarks. Index T erms-- Deep Clustering, Unsupervised Learning, Gradient Estimator 1. INTRODUCTION Clustering is a fundamental task in unsupervised machine learning, and one with numerous applications.

assignment, objective, representation, (16 more...)

1910.08031

Country:

Europe > United Kingdom > England > Surrey (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Fan, Xinjie, Yue, Yuguang, Sarkar, Purnamrita, Wang, Y. X. Rachel

A Unified Framework for Tuning Hyperparameters in Clustering Problems

Selecting hyperparameters for unsupervised learning problems is difficult in general due to the lack of ground truth for validation. However, this issue is prevalent in machine learning, especially in clustering problems with examples including the Lagrange multipliers of penalty terms in semidefinite programming (SDP) relaxations and the bandwidths used for constructing kernel similarity matrices for Spectral Clustering. Despite this, there are not many provable algorithms for tuning these hyperparameters. In this paper, we provide a unified framework with provable guarantees for the above class of problems. We demonstrate our method on two distinct models. First, we show how to tune the hyperparameters in widely used SDP algorithms for community detection in networks. In this case, our method can also be used for model selection. Second, we show the same framework works for choosing the bandwidth for the kernel similarity matrix in Spectral Clustering for subgaussian mixtures under suitable model specification. In a variety of simulation experiments, we show that our framework outperforms other widely used tuning procedures in a broad range of parameter settings.

algorithm, high probability, matrix, (12 more...)

1910.08018

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Kopf, Andreas, Fortuin, Vincent, Somnath, Vignesh Ram, Claassen, Manfred

Mixture-of-Experts Variational Autoencoder for clustering and generating from similarity-based representations

Clustering high-dimensional data, such as images or biological measurements, is a long-standing problem and has been studied extensively. Recently, Deep Clustering gained popularity due to the non-linearity of neural networks, which allows for flexibility in fitting the specific peculiarities of complex data. Here we introduce the Mixture-of-Experts Similarity Variational Autoencoder (MoE-Sim-VAE), a novel generative clustering model. The model can learn multi-modal distributions of high-dimensional data and use these to generate realistic data with high efficacy and efficiency. MoE-Sim-VAE is based on a Variational Autoencoder (VAE), where the decoder consists of a Mixture-of-Experts (MoE) architecture. This specific architecture allows for various modes of the data to be automatically learned by means of the experts. Additionally, we encourage the latent representation of our model to follow a Gaussian mixture distribution and to accurately represent the similarities between the data points. We assess the performance of our model on synthetic data, the MNIST benchmark data set, and a challenging real-world task of defining cell subpopulations from mass cytometry (CyTOF) measurements on hundreds of different datasets. MoE-Sim-VAE exhibits superior clustering performance on all these tasks in comparison to the baselines and we show that the MoE architecture in the decoder reduces the computational cost of sampling specific data modes with high fidelity.

ctober 18, latent representation, moe-sim-v ae, (13 more...)

1910.07763

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

#artificialintelligenceOct-16-2019, 21:25:20 GMT

Creating a k-means clustering model BigQuery ML Google Cloud

Enter the following standard SQL query in the Query editor text area. When the query is complete, click the Results tab below the query text area. The results tab shows the columns you queried that are used to train your model: station_name, duration, num_trips, distance_from_city_center. The results should look like the following. Now that you have examined your training data, the next step is to create a k-means model using the data. You can create and train a k-means model using the CREATE MODEL statement with the option model_type kmeans.

dataset, k-means model, query, (10 more...)

#artificialintelligence

Industry: Information Technology > Services (0.40)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.51)

#artificialintelligenceOct-16-2019, 16:12:43 GMT

Customer Segmentation Using K Means Clustering - WebSystemer.no

I started with loading all the libraries and dependencies. The columns in the dataset are customer id, gender, age, income and spending score. I dropped the id column as that does not seem relevant to the context. Also I plotted the age frequency of customers. Next I made a box plot of spending score and annual income to better visualize the distribution range. The range of spending score is clearly more than the annual income range.

annual income, customer, spending score, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.77)

arXiv.org Machine LearningOct-16-2019

FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance

Dell'Amico, Matteo

FISHDBC is a flexible, incremental, scalable, and hierarchical density-based clustering algorithm. It is flexible because it empowers users to work on arbitrary data, skipping the feature extraction step that usually transforms raw data in numeric arrays letting users define an arbitrary distance function instead. It is incremental and scalable: it avoids the $\mathcal O(n^2)$ performance of other approaches in non-metric spaces and requires only lightweight computation to update the clustering when few items are added. It is hierarchical: it produces a "flat" clustering which can be expanded to a tree structure, so that users can group and/or divide clusters in sub- or super-clusters when data exploration requires so. It is density-based and approximates HDBSCAN*, an evolution of DBSCAN.

dataset, distance function, fishdbc, (16 more...)

1910.07283

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(16 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Ebli, Stefania, Spreemann, Gard

A Notion of Harmonic Clustering in Simplicial Complexes

arXiv.org Machine LearningOct-16-2019

We outline a novel clustering scheme for simplicial complexes that produces clusters of simplices in a way that is sensitive to the homology of the complex. The method is inspired by, and can be seen as a higher-dimensional version of, graph spectral clustering. The algorithm involves only sparse eigenproblems, and is therefore computationally efficient. We believe that it has broad application as a way to extract features from simplicial complexes that often arise in topological data analysis.

simplicial complex, spectral, subspace, (17 more...)

1910.07247

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

He, Hangfeng, Su, Weijie J.

The Local Elasticity of Neural Networks

arXiv.org Machine LearningOct-15-2019

This paper presents a phenomenon in neural networks that we refer to as \textit{local elasticity}. Roughly speaking, a classifier is said to be locally elastic if its prediction at a feature vector $\bx'$ is \textit{not} significantly perturbed, after the classifier is updated via stochastic gradient descent at a (labeled) feature vector $\bx$ that is \textit{dissimilar} to $\bx'$ in a certain sense. This phenomenon is shown to persist for neural networks with nonlinear activation functions through extensive simulations on real-life and synthetic datasets, whereas this is not observed in linear classifiers. In addition, we offer a geometric interpretation of local elasticity using the neural tangent kernel \citep{jacot2018neural}. Building on top of local elasticity, we obtain pairwise similarity measures between feature vectors, which can be used for clustering in conjunction with $K$-means. The effectiveness of the clustering algorithm on the MNIST and CIFAR-10 datasets in turn corroborates the hypothesis of local elasticity of neural networks on real-life data. Finally, we discuss some implications of local elasticity to shed light on several intriguing aspects of deep neural networks.

local elasticity, neural network, similarity, (13 more...)

1910.06943

Country: North America > United States > Pennsylvania (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)