Goto

Collaborating Authors

 Clustering


Robust Subgraph Learning by Monitoring Early Training Representations

arXiv.org Artificial Intelligence

Graph neural networks (GNNs) have attracted significant attention for their outstanding performance in graph learning and node classification tasks. However, their vulnerability to adversarial attacks, particularly through susceptible nodes, poses a challenge in decision-making. The need for robust graph summarization is evident in adversarial challenges resulting from the propagation of attacks throughout the entire graph. In this paper, we address both performance and adversarial robustness in graph input by introducing the novel technique SHERD (Subgraph Learning Hale through Early Training Representation Distances). SHERD leverages information from layers of a partially trained graph convolutional network (GCN) to detect susceptible nodes during adversarial attacks using standard distance metrics. The method identifies "vulnerable (bad)" nodes and removes such nodes to form a robust subgraph while maintaining node classification performance. Through our experiments, we demonstrate the increased performance of SHERD in enhancing robustness by comparing the network's performance on original and subgraph inputs against various baselines alongside existing adversarial attacks. Our experiments across multiple datasets, including citation datasets such as Cora, Citeseer, and Pubmed, as well as microanatomical tissue structures of cell graphs in the placenta, highlight that SHERD not only achieves substantial improvement in robust performance but also outperforms several baselines in terms of node classification accuracy and computational complexity.


Analysis of singular subspaces under random perturbations

arXiv.org Machine Learning

We present a comprehensive analysis of singular vector and singular subspace perturbations in the context of the signal plus random Gaussian noise matrix model. Assuming a low-rank signal matrix, we extend the Wedin-Davis-Kahan theorem in a fully generalized manner, applicable to any unitarily invariant matrix norm, extending previous results of O'Rourke, Vu and the author. We also obtain the fine-grained results, which encompass the $\ell_\infty$ analysis of singular vectors, the $\ell_{2, \infty}$ analysis of singular subspaces, as well as the exploration of linear and bilinear functions related to the singular vectors. Moreover, we explore the practical implications of these findings, in the context of the Gaussian mixture model and the submatrix localization problem.


Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent, Jordan Boyd-Graber 2, Hal Daumè III 3, Z. Irene Ying

Neural Information Processing Systems

Discovering hierarchical regularities in data is a key problem in interacting with large datasets, modeling cognition, and encoding knowledge. A previous Bayesian solution--Kingman's coalescent--provides a probabilistic model for data represented as a binary tree. Unfortunately, this is inappropriate for data better described by bushier trees. We generalize an existing belief propagation framework of Kingman's coalescent to the beta coalescent, which models a wider range of tree structures. Because of the complex combinatorial search over possible structures, we develop new sampling schemes using sequential Monte Carlo and Dirichlet process mixture models, which render inference efficient and tractable. We present results on synthetic and real data that show the beta coalescent outperforms Kingman's coalescent and is qualitatively better at capturing data in bushy hierarchies.


ca46c1b9512a7a8315fa3c5a946e8265-Reviews.html

Neural Information Processing Systems

Alternatively, if the algorithm performs well against any of the existing AC ones then it would be good to see its performance in various settings and detailed explanation of its properties. The latter would be useful in its own right whether or not it is related to summarizing posterior distribution's characteristics. A few specific comments: Line 19: It is correct to say that the MAP might not be good in situations where the posterior is diffuse, I guess you mean uniform-like? It might also be multimodal, skewed (mode and mean differ), high variance,for example, so the MAP is not necessarily a good choice although it depends on the (underlying) loss function. Line 52: Since a Dirichlet Process is a discrete random probability measure, when sampling from it it induces a random partition (the ties will belong to the same cluster).


Optimistic Concurrency Control for Distributed Unsupervised Learning Stefanie Jegelka

Neural Information Processing Systems

Research on distributed machine learning algorithms has focused primarily on one of two extremes--algorithms that obey strict concurrency constraints or algorithms that obey few or no such constraints. We consider an intermediate alternative in which algorithms optimistically assume that conflicts are unlikely and if conflicts do arise a conflict-resolution protocol is invoked. We view this "optimistic concurrency control" paradigm as particularly appropriate for large-scale machine learning algorithms, particularly in the unsupervised setting. We demonstrate our approach in three problem areas: clustering, feature learning and online facility location. We evaluate our methods via large-scale experiments in a cluster computing environment.


Latent Maximum Margin Clustering

Neural Information Processing Systems

We present a maximum margin framework that clusters data using latent variables. Using latent representations enables our framework to model unobserved information embedded in the data. We implement our idea by large margin learning, and develop an alternating descent algorithm to effectively solve the resultant non-convex optimization problem. We instantiate our latent maximum margin clustering framework with tag-based video clustering tasks, where each video is represented by a latent tag model describing the presence or absence of video tags. Experimental results obtained on three standard datasets show that the proposed method outperforms non-latent maximum margin clustering as well as conventional clustering approaches.


Distributed k-Means and k-Median Clustering on General Topologies

Neural Information Processing Systems

This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by [13], we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We provide a distributed method for constructing a global coreset which improves over the previous methods by reducing the communication complexity, and which works over general communication topologies. Experimental results on large scale data sets show that this approach outperforms other coreset-based distributed clustering algorithms.


53c3bce66e43be4f209556518c2fcb54-Reviews.html

Neural Information Processing Systems

Summary: The paper presents a hard clustering algorithm for clustering batch sequential continuous data. The algorithm is derived by performing a low variance asymptotic analysis of the Gibbs sampling algorithm for the dependent Dirichlet process Gaussian mixture model (DDPMM). This is a well written paper that is easy to follow. The work is technically sound and I only have a few minor concerns. The authors perform a small variance asymptotic analysis for the DDPMM Gibbs sampler and although a similar analysis has previously been performed on the Dirichlet process mixture Gibbs sampler, the model and algorithm considered here are sufficiently different.


Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture Miao Liu MIT

Neural Information Processing Systems

This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a lowvariance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM, and provides a hard clustering with convergence guarantees similar to those of the k-means algorithm. Empirical results from a synthetic test with moving Gaussian clusters and a test with real ADS-B aircraft trajectory data demonstrate that the algorithm requires orders of magnitude less computational time than contemporary probabilistic and hard clustering algorithms, while providing higher accuracy on the examined datasets.


Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation

Neural Information Processing Systems

While several papers have investigated computationally and statistically efficient methods for learning Gaussian mixtures, precise minimax bounds for their statistical performance as well as fundamental limits in high-dimensional settings are not well-understood. In this paper, we provide precise information theoretic bounds on the clustering accuracy and sample complexity of learning a mixture of two isotropic Gaussians in high dimensions under small mean separation. If there is a sparse subset of relevant dimensions that determine the mean separation, then the sample complexity only depends on the number of relevant dimensions and mean separation, and can be achieved by a simple computationally efficient procedure. Our results provide the first step of a theoretical basis for recent methods that combine feature selection and clustering.