Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
Mirzasoleiman, Baharan, Karbasi, Amin, Sarkar, Rik, Krause, Andreas
–Neural Information Processing Systems
Many large-scale machine learning problems (such as clustering, non-parametric learning, kernel machines, etc.) require selecting, out of a massive data set, a manageable, representative subset. Such problems can often be reduced to maximizing a submodular set function subject to cardinality constraints. Classical approaches require centralized access to the full data set; but for truly large-scale problems, rendering the data centrally is often impractical. In this paper, we consider the problem of submodular function maximization in a distributed fashion. We develop a simple, two-stage protocol GreeDI, that is easily implemented using MapReduce style computations.
Neural Information Processing Systems
Feb-14-2020, 17:57:57 GMT
- Technology: