Goto

Collaborating Authors

 Clustering


Automatic Parameter Tying: A New Approach for Regularized Parameter Learning in Markov Networks

AAAI Conferences

Parameter tying is a regularization method in which parameters (weights) of a machine learning model are partitioned into groups by leveraging prior knowledge and all parameters in each group are constrained to take the same value. In this paper, we consider the problem of parameter learning in Markov networks and propose a novel approach called automatic parameter tying (APT) that uses automatic instead of a priori and soft instead of hard parameter tying as a regularization method to alleviate overfitting. The key idea behind APT is to set up the learning problem as the task of finding parameters and groupings of parameters such that the likelihood plus a regularization term is maximized. The regularization term penalizes models where parameter values deviate from their group mean parameter value. We propose and use a block coordinate ascent algorithm to solve the optimization task. We analyze the sample complexity of our new learning algorithm and show that it yields optimal parameters with high probability when the groups are well separated. Experimentally, we show that our method improves upon L 2 regularization and suggest several pragmatic techniques for good practical performance.


Bayesian Robust Attributed Graph Clustering: Joint Learning of Partial Anomalies and Group Structure

AAAI Conferences

We study the problem of robust attributed graph clustering. In real data, the clustering structure is often obfuscated due to anomalies or corruptions. While robust methods have been recently introduced that handle anomalies as part of the clustering process, they all fail to account for one core aspect: Since attributed graphs consist of two views (network structure and attributes) anomalies might materialize only partially, i.e. instances might be corrupted in one view but perfectly fit in the other. In this case, we can still derive meaningful cluster assignments. Existing works only consider complete anomalies. In this paper, we present a novel probabilistic generative model (PAICAN) that explicitly models partial anomalies by generalizing ideas of Degree Corrected Stochastic Block Models and Bernoulli Mixture Models. We provide a highly scalable variational inference approach with runtime complexity linear in the number of edges. The robustness of our model w.r.t. anomalies is demonstrated by our experimental study, outperforming state-of-the-art competitors.


Dependence Guided Unsupervised Feature Selection

AAAI Conferences

In the past decade, various sparse learning based unsupervised feature selection methods have been developed. However, most existing studies adopt a two-step strategy, i.e., selecting the top-m features according to a calculated descending order and then performing K-means clustering, resulting in a group of sub-optimal features. To address this problem, we propose a Dependence Guided Unsupervised Feature Selection (DGUFS) method to select features and partition data in a joint manner. Our proposed method enhances the inter-dependence among original data, cluster labels, and selected features. In particular, a projection-free feature selection model is proposed based on l20-norm equality constraints. We utilize the learned cluster labels to fill in the information gap between original data and selected features. Two dependence guided terms are consequently proposed for our model. More specifically, one term increases the dependence of desired cluster labels on original data, while the other term maximizes the dependence of selected features on cluster labels to guide the process of feature selection. Last but not least, an iterative algorithm based on Alternating Direction Method of Multipliers (ADMM) is designed to solve the constrained minimization problem efficiently. Extensive experiments on different datasets consistently demonstrate that our proposed method significantly outperforms state-of-the-art baselines.


Exact Clustering via Integer Programming and Maximum Satisfiability

AAAI Conferences

We consider the following general graph clustering problem: given a complete undirected graph G=(V,E,c) with an edge weight function c:E->Q, we are asked to find a partition C of V that maximizes the sum of edge weights within the clusters in C. Owing to its high generality, this problem has a wide variety of real-world applications, including correlation clustering, group technology, and community detection. In this study, we investigate the design of mathematical programming formulations and constraint satisfaction formulations for the problem. First, we present a novel integer linear programming (ILP) formulation that has far fewer constraints than the standard ILP formulation by Groetschel and Wakabayashi (1989). Second, we propose an ILP-based exact algorithm that solves an ILP problem obtained by modifying our above ILP formulation and then performs simple post-processing to produce an optimal solution to the original problem. Third, we present maximum satisfiability (MaxSAT) counterparts of both our ILP formulation and ILP-based exact algorithm. Computational experiments using well-known real-world datasets demonstrate that our ILP-based approaches and their MaxSAT counterparts are highly effective in terms of both memory efficiency and computation time.


Partial Multi-View Outlier Detection Based on Collective Learning

AAAI Conferences

In the past decade, various multi-view outlier detection methods have been designed to detect horizontal outliers that exhibit inconsistent across-view characteristics. The existing works assume that all objects are present in all views. However, in real-world applications, it is often the incomplete case that every view may suffer from some missing samples, resulting in partial objects difficult to detect outliers from. To address this problem, we propose a novel Collective Learning (CL) based framework to detect outliers from partial multi-view data in a self-guided way. More specifically, by well exploiting the inter-dependence among different views, we develop an algorithm to reconstruct missing samples based on learning. Furthermore, we propose similarity-based outlier detection to break through the dilemma that the number of clusters is unknown priori. Then, the calculated outlier scores act as the confidence levels in CL and in turn guide the reconstruction of missing data. Learning-based missing sample recovery and similarity-based outlier detection are iteratively performed in a self-guided manner. Experimental results on benchmark datasets show that our proposed approach consistently and significantly outperforms state-of-the-art baselines.


Learning Latent Representations in Neural Networks for Clustering through Pseudo Supervision and Graph-based Activity Regularization

arXiv.org Machine Learning

In this paper, we propose a novel unsupervised clustering approach exploiting the hidden information that is indirectly introduced through a pseudo classification objective. Specifically, we randomly assign a pseudo parent-class label to each observation which is then modified by applying the domain specific transformation associated with the assigned label. Generated pseudo observation-label pairs are subsequently used to train a neural network with Auto-clustering Output Layer (ACOL) that introduces multiple softmax nodes for each pseudo parent-class. Due to the unsupervised objective based on Graph-based Activity Regularization (GAR) terms, softmax duplicates of each parent-class are specialized as the hidden information captured through the help of domain specific transformations is propagated during training. Ultimately we obtain a k-means friendly latent representation. Furthermore, we demonstrate how the chosen transformation type impacts performance and helps propagate the latent information that is useful in revealing unknown clusters. Our results show state-of-the-art performance for unsupervised clustering tasks on MNIST, SVHN and USPS datasets, with the highest accuracies reported to date in the literature.


Go With the Flow, on Jupiter and Snow. Coherence From Model-Free Video Data without Trajectories

arXiv.org Machine Learning

Viewing a data set such as the clouds of Jupiter, coherence is readily apparent to human observers, especially the Great Red Spot, but also other great storms and persistent structures. There are now many different definitions and perspectives mathematically describing coherent structures, but we will take an image processing perspective here. We describe an image processing perspective inference of coherent sets from a fluidic system directly from image data, without attempting to first model underlying flow fields, related to a concept in image processing called motion tracking. In contrast to standard spectral methods for image processing which are generally related to a symmetric affinity matrix, leading to standard spectral graph theory, we need a not symmetric affinity which arises naturally from the underlying arrow of time. We develop an anisotropic, directed diffusion operator corresponding to flow on a directed graph, from a directed affinity matrix developed with coherence in mind, and corresponding spectral graph theory from the graph Laplacian. Our methodology is not offered as more accurate than other traditional methods of finding coherent sets, but rather our approach works with alternative kinds of data sets, in the absence of vector field. Our examples will include partitioning the weather and cloud structures of Jupiter, and a local to Potsdam, N.Y. lake-effect snow event on Earth, as well as the benchmark test double-gyre system.


Sketched Subspace Clustering

arXiv.org Machine Learning

The immense amount of daily generated and communicated data presents unique challenges in their processing. Clustering, the grouping of data without the presence of ground-truth labels, is an important tool for drawing inferences from data. Subspace clustering (SC) is a relatively recent method that is able to successfully classify nonlinearly separable data in a multitude of settings. In spite of their high clustering accuracy, SC methods incur prohibitively high computational complexity when processing large volumes of high-dimensional data. Inspired by random sketching approaches for dimensionality reduction, the present paper introduces a randomized scheme for SC, termed Sketch-SC, tailored for large volumes of high-dimensional data. Sketch-SC accelerates the computationally heavy parts of state-of-the-art SC approaches by compressing the data matrix across both dimensions using random projections, thus enabling fast and accurate large-scale SC. Performance analysis as well as extensive numerical tests on real data corroborate the potential of Sketch-SC and its competitive performance relative to state-of-the-art scalable SC approaches.


A Theoretical Investigation of Graph Degree as an Unsupervised Normality Measure

arXiv.org Machine Learning

For a graph representation of a dataset, a straightforward normality measure for a sample can be its graph degree. Considering a weighted graph, degree of a sample is the sum of the corresponding row's values in a similarity matrix. The measure is intuitive given the abnormal samples are usually rare and they are dissimilar to the rest of the data. In order to have an in-depth theoretical understanding, in this manuscript, we investigate the graph degree in spectral graph clustering based and kernel based point of views and draw connections to a recent kernel method for the two sample problem. We show that our analyses guide us to choose fully-connected graphs whose edge weights are calculated via universal kernels. We show that a simple graph degree based unsupervised anomaly detection method with the above properties, achieves higher accuracy compared to other unsupervised anomaly detection methods on average over 10 widely used datasets. We also provide an extensive analysis on the effect of the kernel parameter on the method's accuracy.


Deep Temporal Clustering : Fully Unsupervised Learning of Time-Domain Features

arXiv.org Machine Learning

Unsupervised learning of time series data, also known as temporal clustering, is a challenging problem in machine learning. Here we propose a novel algorithm, Deep Temporal Clustering (DTC), to naturally integrate dimensionality reduction and temporal clustering into a single end-to-end learning framework, fully unsupervised. The algorithm utilizes an autoencoder for temporal dimensionality reduction and a novel temporal clustering layer for cluster assignment. Then it jointly optimizes the clustering objective and the dimensionality reduction objec tive. Based on requirement and application, the temporal clustering layer can be customized with any temporal similarity metric. Several similarity metrics and state-of-the-art algorithms are considered and compared. To gain insight into temporal features that the network has learned for its clustering, we apply a visualization method that generates a region of interest heatmap for the time series. The viability of the algorithm is demonstrated using time series data from diverse domains, ranging from earthquakes to spacecraft sensor data. In each case, we show that the proposed algorithm outperforms traditional methods. The superior performance is attributed to the fully integrated temporal dimensionality reduction and clustering criterion.