Distributed Submodular Cover: Succinctly Summarizing Massive Data

Jan-14-2025, 22:24:23 GMT–Neural Information Processing Systems

How can one find a subset, ideally as small as possible, that well represents a massive dataset? I.e., its corresponding utility, measured according to a suitable utility function, should be comparable to that of the whole dataset. Here, the utility is assumed to exhibit submodularity, a natural diminishing returns condition preva- lent in many data summarization applications. The classical greedy algorithm is known to provide solutions with logarithmic approximation guarantees compared to the optimum solution. However, this sequential, centralized approach is imprac- tical for truly large-scale problems. In this work, we develop the first distributed algorithm – DISCOVER – for submodular set cover that is easily implementable using MapReduce-style computations.

massive data, submodular cover, succinctly, (3 more...)

Neural Information Processing Systems

Jan-14-2025, 22:24:23 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.82)