Goto

Collaborating Authors

 efficient streaming algorithm


An Efficient Streaming Algorithm for the Submodular Cover Problem

Neural Information Processing Systems

We initiate the study of the classical Submodular Cover (SC) problem in the data streaming model which we refer to as the Streaming Submodular Cover (SSC). We show that any single pass streaming algorithm using sublinear memory in the size of the stream will fail to provide any non-trivial approximation guarantees for SSC. Hence, we consider a relaxed version of SSC, where we only seek to find a partial cover. We design the first Efficient bicriteria Submodular Cover Streaming (ESC-Streaming) algorithm for this problem, and provide theoretical guarantees for its performance supported by numerical evidence. Our algorithm finds solutions that are competitive with the near-optimal offline greedy algorithm despite requiring only a single pass over the data stream. In our numerical experiments, we evaluate the performance of ESC-Streaming on active set selection and large-scale graph cover problems.


Efficient Streaming Algorithms for Graphlet Sampling

Neural Information Processing Systems

Given a graph G and a positive integer k, the Graphlet Sampling problem asks to sample a connected induced k -vertex subgraph of G uniformly at random.Graphlet sampling enhances machine learning applications by transforming graph structures into feature vectors for tasks such as graph classification and subgraph identification, boosting neural network performance, and supporting clustered federated learning by capturing local structures and relationships.A recent work has shown that the problem admits an algorithm that preprocesses G in time O(nk 2 \log k m), and draws one sample in expected time k {O(k)} \log n, where n V(G) and m E(G) . Such an algorithm relies on the assumption that the input graph fits into main memory and it does not seem to be straightforward to adapt it to very large graphs. We consider Graphlet Sampling in the semi-streaming setting, where we have a memory of M \Omega(n \log n) words, and G can be only read through sequential passes over the edge list. We develop a semi-streaming algorithm that preprocesses G in p {O}(\log n) passes and samples \Theta(M k {-O(k)}) independent uniform k -graphlets in O(k) passes. For constant k, both phases run in time O((n m)\log n) .


Reviews: An Efficient Streaming Algorithm for the Submodular Cover Problem

Neural Information Processing Systems

The problem studied is the min set cover where there is a family of sets that cover some universe and the goal is to find a family of sets of minimal size that covers at least Q elements in the universe. It is well known that this problem is NP hard and that no polynomial time algorithm can obtain an approximation guarantee better than log n, under reasonable computational complexity assumptions. In the streaming setting, defined in this paper for this problem for the first time, the goal is to solve the set cover problem using a single pass on the data using sublinear memory when the assumption is that sets arrive in an adversarial order. Motivated by impossibility results for this problem, shown in this paper, the authors define the Streaming Bicriteria Submodular Cover (SBSC) problem where the goal is to obtain (1-\epsilon,delta) bi-criteria approximations, i.e. cover a (1-\epsilon) fraction of the universe using \delta factor of the minimum sets of the optimal solution. The main result is a streaming algorithm in the model defined which covers a (1-\epsilon) fraction of universe using 2/\epsilon sets of the optimal solution.


An Efficient Streaming Algorithm for the Submodular Cover Problem

Norouzi-Fard, Ashkan, Bazzi, Abbas, Bogunovic, Ilija, Halabi, Marwa El, Hsieh, Ya-Ping, Cevher, Volkan

Neural Information Processing Systems

We initiate the study of the classical Submodular Cover (SC) problem in the data streaming model which we refer to as the Streaming Submodular Cover (SSC). We show that any single pass streaming algorithm using sublinear memory in the size of the stream will fail to provide any non-trivial approximation guarantees for SSC. Hence, we consider a relaxed version of SSC, where we only seek to find a partial cover. We design the first Efficient bicriteria Submodular Cover Streaming (ESC-Streaming) algorithm for this problem, and provide theoretical guarantees for its performance supported by numerical evidence. Our algorithm finds solutions that are competitive with the near-optimal offline greedy algorithm despite requiring only a single pass over the data stream.