Goto

Collaborating Authors

 Clustering


In-Session Personalization for Talent Search

arXiv.org Artificial Intelligence

Previous efforts in recommendation of candidates for talent search followed the general pattern of receiving an initial search criteria and generating a set of candidates utilizing a pre-trained model. Traditionally, the generated recommendations are final, that is, the list of potential candidates is not modified unless the user explicitly changes his/her search criteria. In this paper, we are proposing a candidate recommendation model which takes into account the immediate feedback of the user, and updates the candidate recommendations at each step. This setting also allows for very uninformative initial search queries, since we pinpoint the user's intent due to the feedback during the search session. To achieve our goal, we employ an intent clustering method based on topic modeling which separates the candidate space into meaningful, possibly overlapping, subsets (which we call intent clusters) for each position. On top of the candidate segments, we apply a multi-armed bandit approach to choose which intent cluster is more appropriate for the current session. We also present an online learning scheme which updates the intent clusters within the session, due to user feedback, to achieve further personalization. Our offline experiments as well as the results from the online deployment of our solution demonstrate the benefits of our proposed methodology.


Understanding K-means Clustering in Machine Learning

#artificialintelligence

K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. Typically, unsupervised algorithms make inferences from datasets using only input vectors without referring to known, or labelled, outcomes. AndreyBu, who has more than 5 years of machine learning experience and currently teaches people his skills, says that "the objective of K-means is simple: group similar data points together and discover underlying patterns. To achieve this objective, K-means looks for a fixed number (k) of clusters in a dataset." A cluster refers to a collection of data points aggregated together because of certain similarities. You'll define a target number k, which refers to the number of centroids you need in the dataset.


Memory Efficient Experience Replay for Streaming Learning

arXiv.org Machine Learning

In supervised machine learning, an agent is typically trained once and then deployed. While this works well for static settings, robots often operate in changing environments and must quickly learn new things from data streams. In this paradigm, known as streaming learning, a learner is trained online, in a single pass, from a data stream that cannot be assumed to be independent and identically distributed (iid). Streaming learning will cause conventional deep neural networks (DNNs) to fail for two reasons: 1) they need multiple passes through the entire dataset; and 2) non-iid data will cause catastrophic forgetting. An old fix to both of these issues is rehearsal. To learn a new example, rehearsal mixes it with previous examples, and then this mixture is used to update the DNN. Full rehearsal is slow and memory intensive because it stores all previously observed examples, and its effectiveness for preventing catastrophic forgetting has not been studied in modern DNNs. Here, we describe the ExStream algorithm for memory efficient rehearsal and compare it to alternatives. We find that full rehearsal can eliminate catastrophic forgetting in a variety of streaming learning settings, with ExStream performing well using far less memory and computation.


Ensemble Clustering for Graphs

arXiv.org Machine Learning

Many data-sets are relational in nature, describing interactions between entities, such as friendship networks, communications or geographical co-locations. Most networks that arise in nature exhibit complex structure [1, 2] with subsets of vertices densely interconnected relative to the rest of the network, which we call communities or clusters. Binary relational data-sets are typically represented as graphs G (V, E), where vertices v V represent the entities, and edges e E represent the relations between pairs of entities. For analyzing and exploring complex relational data-sets, graph clustering is commonly used. In this paper, we propose ECG (Ensemble Clustering for Graphs), a graph clustering method based on the concept of co-association consensus clustering. We show that this approach identifies very high quality clusters by replicating the study in [3] and comparing ECG against the best performing algorithms. We also demonstrate that ECG is stable despite the fact of being a randomize algorithm and that it reduces significantly the resolution limit problem, yielding a number of clusters very close to the ground truth partition size. Finally, ECG provides information about the strength of the associations between entities which can be used to determine the presence or absence of communities in the network.


Random Warping Series: A Random Features Method for Time-Series Embedding

arXiv.org Machine Learning

Time series data analytics has been a problem of substantial interests for decades, and Dynamic Time Warping (DTW) has been the most widely adopted technique to measure dissimilarity between time series. A number of global-alignment kernels have since been proposed in the spirit of DTW to extend its use to kernel-based estimation method such as support vector machine. However, those kernels suffer from diagonal dominance of the Gram matrix and a quadratic complexity w.r.t. the sample size. In this work, we study a family of alignment-aware positive definite (p.d.) kernels, with its feature embedding given by a distribution of \emph{Random Warping Series (RWS)}. The proposed kernel does not suffer from the issue of diagonal dominance while naturally enjoys a \emph{Random Features} (RF) approximation, which reduces the computational complexity of existing DTW-based techniques from quadratic to linear in terms of both the number and the length of time-series. We also study the convergence of the RF approximation for the domain of time series of unbounded length. Our extensive experiments on 16 benchmark datasets demonstrate that RWS outperforms or matches state-of-the-art classification and clustering methods in both accuracy and computational time. Our code and data is available at { \url{https://github.com/IBM/RandomWarpingSeries}}.


CNN features are also great at unsupervised classification

arXiv.org Artificial Intelligence

This paper aims at providing insight on the transferability of deep CNN features to unsupervised problems. We study the impact of different pretrained CNN feature extractors on the problem of image set clustering for object classification as well as fine-grained classification. We propose a rather straightforward pipeline combining deep-feature extraction using a CNN pretrained on ImageNet and a classic clustering algorithm to classify sets of images. This approach is compared to state-of-the-art algorithms in image-clustering and provides better results. These results strengthen the belief that supervised training of deep CNN on large datasets, with a large variability of classes, extracts better features than most carefully designed engineering approaches, even for unsupervised tasks. We also validate our approach on a robotic application, consisting in sorting and storing objects smartly based on clustering.


ClusterGAN : Latent Space Clustering in Generative Adversarial Networks

arXiv.org Machine Learning

Generative Adversarial networks (GANs) have obtained remarkable success in many unsupervised learning tasks and unarguably, clustering is an important unsupervised learning problem. While one can potentially exploit the latent-space back-projection in GANs to cluster, we demonstrate that the cluster structure is not retained in the GAN latent space. In this paper, we propose ClusterGAN as a new mechanism for clustering using GANs. By sampling latent variables from a mixture of one-hot encoded variables and continuous latent variables, coupled with an inverse network (which projects the data to the latent space) trained jointly with a clustering specific loss, we are able to achieve clustering in the latent space. Our results show a remarkable phenomenon that GANs can preserve latent space interpolation across categories, even though the discriminator is never exposed to such vectors. We compare our results with various clustering baselines and demonstrate superior performance on both synthetic and real datasets.


Clustering of graph vertex subset via Krylov subspace model reduction

arXiv.org Machine Learning

Clustering via graph-Laplacian spectral imbedding is ubiquitous in data science and machine learning. However, it becomes less efficient for large data sets due to two factors. First, computing the partial eigendecomposition of the graph-Laplacian typically requires a large Krylov subspace. Second, after the spectral imbedding is complete, the clustering is typically performed with various relaxations of k-means, which may become prone to getting stuck in local minima and scale poorly in terms of computational cost for large data sets. Here we propose two novel algorithms for spectral clustering of a subset of the graph vertices (target subset) based on the theory of model order reduction. They rely on realizations of a reduced order model (ROM) that accurately approximates the diffusion transfer function of the original graph for inputs and outputs restricted to the target subset. While our focus is limited to this subset, our algorithms produce its clustering that is consistent with the overall structure of the graph. Moreover, working with a small target subset reduces greatly the required dimension of Krylov subspace and allows to exploit the approximations of k-means in the regimes when they are most robust and efficient, as verified by the numerical experiments. There are several uses for our algorithms. First, they can be employed on their own to clusterize a representative subset in cases when the full graph clustering is either infeasible or not required. Second, they may be used for quality control. Third, as they drastically reduce the problem size, they enable the application of more powerful approximations of k-means like those based on semi-definite programming (SDP) instead of the conventional Lloyd's algorithm. Finally, they can be used as building blocks of a divide-and-conquer algorithm for the full graph clustering. The latter will be reported in a separate article.


On a 'Two Truths' Phenomenon in Spectral Graph Clustering

arXiv.org Machine Learning

Clustering is concerned with coherently grouping observations without any explicit concept of true groupings. Spectral graph clustering - clustering the vertices of a graph based on their spectral embedding - is commonly approached via K-means (or, more generally, Gaussian mixture model) clustering composed with either Laplacian or Adjacency spectral embedding (LSE or ASE). Recent theoretical results provide new understanding of the problem and solutions, and lead us to a 'Two Truths' LSE vs. ASE spectral graph clustering phenomenon convincingly illustrated here via a diffusion MRI connectome data set: the different embedding methods yield different clustering results, with LSE capturing left hemisphere/right hemisphere affinity structure and ASE capturing gray matter/white matter core-periphery structure.


FI-GRL: Fast Inductive Graph Representation Learning via Projection-Cost Preservation

arXiv.org Machine Learning

Graph representation learning aims at transforming graph data into meaningful low-dimensional vectors to facilitate the employment of machine learning and data mining algorithms designed for general data. Most current graph representation learning approaches are transductive, which means that they require all the nodes in the graph are known when learning graph representations and these approaches cannot naturally generalize to unseen nodes. In this paper, we present a Fast Inductive Graph Representation Learning framework (FI-GRL) to learn nodes' low-dimensional representations. Our approach can obtain accurate representations for seen nodes with provable theoretical guarantees and can easily generalize to unseen nodes. Specifically, in order to explicitly decouple nodes' relations expressed by the graph, we transform nodes into a randomized subspace spanned by a random projection matrix. This stage is guaranteed to preserve the projection-cost of the normalized random walk matrix which is highly related to the normalized cut of the graph. Then feature extraction is achieved by conducting singular value decomposition on the obtained matrix sketch. By leveraging the property of projection-cost preservation on the matrix sketch, the obtained representation result is nearly optimal. To deal with unseen nodes, we utilize folding-in technique to learn their meaningful representations. Empirically, when the amount of seen nodes are larger than that of unseen nodes, FI-GRL always achieves excellent results. Our algorithm is fast, simple to implement and theoretically guaranteed. Extensive experiments on real datasets demonstrate the superiority of our algorithm on both efficacy and efficiency over both macroscopic level (clustering) and microscopic level (structural hole detection) applications.