Goto

Collaborating Authors

 parallel algorithm


Tight Bounds for Maximum Weight Matroid Independent Set and Matching in the Zero Communication Model

Neural Information Processing Systems

Recent years have revealed an unprecedented demand for AI-based technology, leading to a common setting where immense data is distributed across multiple locations. This creates a communication bottleneck among the storage facilities, often aiming to jointly solve tasks of small solution size k from input of astronomically large size n. Motivated by federated and distributed machine learning applications, we study two fundamental optimization problems, maximum weight matroid independent set (MW-IS) and maximum weight matching (MWM), in a zero communication computational model. In this model, the data is dispersed between m servers. Without any communication, each server has to send a message to a central coordinator which is required to compute an optimal solution for the original (large) instance.




Parallel and Efficient Hierarchical k-Median Clustering

Neural Information Processing Systems

As a fundamental unsupervised learning task, hierarchical clustering has been extensively studied in the past decade. In particular, standard metric formulations as hierarchical $k$-center, $k$-means, and $k$-median received a lot of attention and the problems have been studied extensively in different models of computation. Despite all this interest, not many efficient parallel algorithms are known for these problems. In this paper we introduce a new parallel algorithm for the Euclidean hierarchical $k$-median problem that, when using machines with memory $s$ (for $s\in \Omega(\log^2 (n+\Delta+d))$), outputs a hierarchical clustering such that for every fixed value of $k$ the cost of the solution is at most an $O(\min\{d, \log n\} \log \Delta)$ factor larger in expectation than that of an optimal solution. Furthermore, we also get that for all $k$ simultanuously the cost of the solution is at most an $O(\min\{d, \log n\} \log \Delta \log (\Delta d n))$ factor bigger that the corresponding optimal solution.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

We acknowledge the reviewer's concern about extending our approach to distributed environments. However, we feel that this should not be a criticism of this paper, whose focus is on multicore systems which pose a unique set of challenges. Furthermore, we have shown that we can handle large graphs without stretching the limits of multicore systems.



Parallel Correlation Clustering on Big Graphs

Neural Information Processing Systems

Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obtains a 3-approximation ratio. Unfortunately, in practice KwikCluster requires a large number of clustering rounds, a potential bottleneck for large graphs.



Efficient and Local Parallel Random Walks

Neural Information Processing Systems

Random walks are a fundamental primitive used in many machine learning algorithms with several applications in clustering and semi-supervised learning.