Distributed Submodular Maximization: Identifying Representative Elements in Massive Data

Mirzasoleiman, Baharan, Karbasi, Amin, Sarkar, Rik, Krause, Andreas

Feb-14-2020, 17:57:57 GMT–Neural Information Processing Systems

Many large-scale machine learning problems (such as clustering, non-parametric learning, kernel machines, etc.) require selecting, out of a massive data set, a manageable, representative subset. Such problems can often be reduced to maximizing a submodular set function subject to cardinality constraints. Classical approaches require centralized access to the full data set; but for truly large-scale problems, rendering the data centrally is often impractical. In this paper, we consider the problem of submodular function maximization in a distributed fashion. We develop a simple, two-stage protocol GreeDI, that is easily implemented using MapReduce style computations.

identifying representative element, massive data, submodular maximization

Neural Information Processing Systems

Feb-14-2020, 17:57:57 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.65)