Goto

Collaborating Authors

 Clustering


Clustering and Analysis of GPS Trajectory Data using Distance-based Features

arXiv.org Artificial Intelligence

The proliferation of smartphones has accelerated mobility studies by largely increasing the type and volume of mobility data available. One such source of mobility data is from GPS technology, which is becoming increasingly common and helps the research community understand mobility patterns of people. However, there lacks a standardized framework for studying the different mobility patterns created by the non-Work, non-Home locations of Working and Nonworking users on Workdays and Offdays using machine learning methods. We propose a new mobility metric, Daily Characteristic Distance, and use it to generate features for each user together with Origin-Destination matrix features. We then use those features with an unsupervised machine learning method, $k$-means clustering, and obtain three clusters of users for each type of day (Workday and Offday). Finally, we propose two new metrics for the analysis of the clustering results, namely User Commonality and Average Frequency. By using the proposed metrics, interesting user behaviors can be discerned and it helps us to better understand the mobility patterns of the users.


An Introduction to Kernel and Operator Learning Methods for Homogenization by Self-consistent Clustering Analysis

arXiv.org Artificial Intelligence

Recent advances in operator learning theory have improved our knowledge about learning maps between infinite dimensional spaces. However, for large-scale engineering problems such as concurrent multiscale simulation for mechanical properties, the training cost for the current operator learning methods is very high. The article presents a thorough analysis on the mathematical underpinnings of the operator learning paradigm and proposes a kernel learning method that maps between function spaces. We first provide a survey of modern kernel and operator learning theory, as well as discuss recent results and open problems. From there, the article presents an algorithm to how we can analytically approximate the piecewise constant functions on R for operator learning. This implies the potential feasibility of success of neural operators on clustered functions. Finally, a k-means clustered domain on the basis of a mechanistic response is considered and the Lippmann-Schwinger equation for micro-mechanical homogenization is solved. The article briefly discusses the mathematics of previous kernel learning methods and some preliminary results with those methods. The proposed kernel operator learning method uses graph kernel networks to come up with a mechanistic reduced order method for multiscale homogenization.


Clustering-Induced Generative Incomplete Image-Text Clustering (CIGIT-C)

arXiv.org Artificial Intelligence

The target of image-text clustering (ITC) is to find correct clusters by integrating complementary and consistent information of multi-modalities for these heterogeneous samples. However, the majority of current studies analyse ITC on the ideal premise that the samples in every modality are complete. This presumption, however, is not always valid in real-world situations. The missing data issue degenerates the image-text feature learning performance and will finally affect the generalization abilities in ITC tasks. Although a series of methods have been proposed to address this incomplete image text clustering issue (IITC), the following problems still exist: 1) most existing methods hardly consider the distinct gap between heterogeneous feature domains. 2) For missing data, the representations generated by existing methods are rarely guaranteed to suit clustering tasks. 3) Existing methods do not tap into the latent connections both inter and intra modalities. In this paper, we propose a Clustering-Induced Generative Incomplete Image-Text Clustering(CIGIT-C) network to address the challenges above. More specifically, we first use modality-specific encoders to map original features to more distinctive subspaces. The latent connections between intra and inter-modalities are thoroughly explored by using the adversarial generating network to produce one modality conditional on the other modality. Finally, we update the corresponding modalityspecific encoders using two KL divergence losses. Experiment results on public image-text datasets demonstrated that the suggested method outperforms and is more effective in the IITC job.


Learning non-stationary and discontinuous functions using clustering, classification and Gaussian process modelling

arXiv.org Artificial Intelligence

Surrogate models have shown to be an extremely efficient aid in solving engineering problems that require repeated evaluations of an expensive computational model. They are built by sparsely evaluating the costly original model and have provided a way to solve otherwise intractable problems. A crucial aspect in surrogate modelling is the assumption of smoothness and regularity of the model to approximate. This assumption is however not always met in reality. For instance in civil or mechanical engineering, some models may present discontinuities or non-smoothness, e.g., in case of instability patterns such as buckling or snap-through. Building a single surrogate model capable of accounting for these fundamentally different behaviors or discontinuities is not an easy task. In this paper, we propose a three-stage approach for the approximation of non-smooth functions which combines clustering, classification and regression. The idea is to split the space following the localized behaviors or regimes of the system and build local surrogates that are eventually assembled. A sequence of well-known machine learning techniques are used: Dirichlet process mixtures models (DPMM), support vector machines and Gaussian process modelling. The approach is tested and validated on two analytical functions and a finite element model of a tensile membrane structure.


Extracting Semantic Knowledge from GANs with Unsupervised Learning

arXiv.org Artificial Intelligence

Recently, unsupervised learning has made impressive progress on various tasks. Despite the dominance of discriminative models, increasing attention is drawn to representations learned by generative models and in particular, Generative Adversarial Networks (GANs). Previous works on the interpretation of GANs reveal that GANs encode semantics in feature maps in a linearly separable form. In this work, we further find that GAN's features can be well clustered with the linear separability assumption. We propose a novel clustering algorithm, named KLiSH, which leverages the linear separability to cluster GAN's features. KLiSH succeeds in extracting fine-grained semantics of GANs trained on datasets of various objects, e.g., car, portrait, animals, and so on. With KLiSH, we can sample images from GANs along with their segmentation masks and synthesize paired image-segmentation datasets. Using the synthesized datasets, we enable two downstream applications. First, we train semantic segmentation networks on these datasets and test them on real images, realizing unsupervised semantic segmentation. Second, we train image-to-image translation networks on the synthesized datasets, enabling semantic-conditional image synthesis without human annotations.


Overlapping oriented imbalanced ensemble learning method based on projective clustering and stagewise hybrid sampling

arXiv.org Artificial Intelligence

The challenge of imbalanced learning lies not only in class imbalance problem, but also in the class overlapping problem which is complex. However, most of the existing algorithms mainly focus on the former. The limitation prevents the existing methods from breaking through. To address this limitation, this paper proposes an ensemble learning algorithm based on dual clustering and stage-wise hybrid sampling (DCSHS). The DCSHS has three parts. Firstly, we design a projection clustering combination framework (PCC) guided by Davies-Bouldin clustering effectiveness index (DBI), which is used to obtain high-quality clusters and combine them to obtain a set of cross-complete subsets (CCS) with balanced class and low overlapping. Secondly, according to the characteristics of subset classes, a stage-wise hybrid sampling algorithm is designed to realize the de-overlapping and balancing of subsets. Finally, a projective clustering transfer mapping mechanism (CTM) is constructed for all processed subsets by means of transfer learning, thereby reducing class overlapping and explore structure information of samples. The major advantage of our algorithm is that it can exploit the intersectionality of the CCS to realize the soft elimination of overlapping majority samples, and learn as much information of overlapping samples as possible, thereby enhancing the class overlapping while class balancing. In the experimental section, more than 30 public datasets and over ten representative algorithms are chosen for verification. The experimental results show that the DCSHS is significantly best in terms of various evaluation criteria.


Semisoft Task Clustering for Multi-Task Learning

arXiv.org Artificial Intelligence

Multi-task learning (MTL) aims to improve the performance of multiple related prediction tasks by leveraging useful information from them. Due to their flexibility and ability to reduce unknown coefficients substantially, the task-clustering-based MTL approaches have attracted considerable attention. Motivated by the idea of semisoft clustering of data, we propose a semisoft task clustering approach, which can simultaneously reveal the task cluster structure for both pure and mixed tasks as well as select the relevant features. The main assumption behind our approach is that each cluster has some pure tasks, and each mixed task can be represented by a linear combination of pure tasks in different clusters. To solve the resulting non-convex constrained optimization problem, we design an efficient three-step algorithm. The experimental results based on synthetic and real-world datasets validate the effectiveness and efficiency of the proposed approach. Finally, we extend the proposed approach to a robust task clustering problem.


Sketch-and-solve approaches to k-means clustering by semidefinite programming

arXiv.org Artificial Intelligence

One of the most fundamental data processing tasks is clustering. Here, one is given a collection of objects, a notion of similarity between those objects, and a clustering objective that scores any given partition according to how well it clusters similar objects together. The goal is then to partition the objects in a way that optimizes this clustering objective. For example, to partition the vertices of a simple graph into two clusters, one might take the clustering objective to be edge cut in the graph complement. This clustering problem is equivalent to MAX-CUT, which is known to be NP-hard [21].


A Faster $k$-means++ Algorithm

arXiv.org Artificial Intelligence

K-means++ is an important algorithm to choose initial cluster centers for the k-means clustering algorithm. In this work, we present a new algorithm that can solve the $k$-means++ problem with near optimal running time. Given $n$ data points in $\mathbb{R}^d$, the current state-of-the-art algorithm runs in $\widetilde{O}(k )$ iterations, and each iteration takes $\widetilde{O}(nd k)$ time. The overall running time is thus $\widetilde{O}(n d k^2)$. We propose a new algorithm \textsc{FastKmeans++} that only takes in $\widetilde{O}(nd + nk^2)$ time, in total.


Path Planning for Concentric Tube Robots: a Toolchain with Application to Stereotactic Neurosurgery

arXiv.org Artificial Intelligence

Abstract: We present a toolchain for solving path planning problems for concentric tube robots through obstacle fields. First, ellipsoidal sets representing the target area and obstacles are constructed from labelled point clouds. Then, the nonlinear and highly nonconvex optimal control problem is solved by introducing a homotopy on the obstacle positions where at one extreme of the parameter the obstacles are removed from the operating space, and at the other extreme they are located at their intended positions. We present a detailed example (with more than a thousand obstacles) from stereotactic neurosurgery with real-world data obtained from labelled MPRI scans. Keywords: optimal control, non-linear programming, homotopy methods, optimal path planning, concentric tube robots, stereotactic neurosurgery 1. INTRODUCTION by finding paths of connected voxels that the robot can traverse, subject to constraints on its curvature.