Zadimoghaddam, Morteza
The Cost of Consistency: Submodular Maximization with Constant Recourse
Dütting, Paul, Fusco, Federico, Lattanzi, Silvio, Norouzi-Fard, Ashkan, Svensson, Ola, Zadimoghaddam, Morteza
In this work, we study online submodular maximization, and how the requirement of maintaining a stable solution impacts the approximation. In particular, we seek bounds on the best-possible approximation ratio that is attainable when the algorithm is allowed to make at most a constant number of updates per step. We show a tight information-theoretic bound of $\tfrac{2}{3}$ for general monotone submodular functions, and an improved (also tight) bound of $\tfrac{3}{4}$ for coverage functions. Since both these bounds are attained by non poly-time algorithms, we also give a poly-time randomized algorithm that achieves a $0.51$-approximation. Combined with an information-theoretic hardness of $\tfrac{1}{2}$ for deterministic algorithms from prior work, our work thus shows a separation between deterministic and randomized algorithms, both information theoretically and for poly-time algorithms.
Consistent Submodular Maximization
Dütting, Paul, Fusco, Federico, Lattanzi, Silvio, Norouzi-Fard, Ashkan, Zadimoghaddam, Morteza
Submodular optimization is a powerful framework for modeling and solving problems that exhibit the widespread diminishing returns property. Thanks to its effectiveness, it has been applied across diverse domains, including video analysis [Zheng et al., 2014], data summarization [Lin and Bilmes, 2011, Bairi et al., 2015], sparse reconstruction [Bach, 2010, Das and Kempe, 2011], and active learning [Golovin and Krause, 2011, Amanatidis et al., 2022]. In this paper, we focus on submodular maximization under cardinality constraints: given a submodular function f, a universe of elements V, and a cardinality constraint k, the goal is to find a set S of at most k elements that maximizes f(S). Submodular maximization under cardinality constraints is NP-hard, nevertheless efficient approximation algorithms exist for this task in both the centralized and the streaming setting [Nemhauser et al., 1978, Badanidiyuru et al., 2014, Kazemi et al., 2019]. One aspect of efficient approximation algorithms for submodular maximization that has received little attention so far, is the stability of the solution. In fact, for some of the known algorithms, even adding a single element to the universe of elements V may completely change the final output (see Appendix A for some examples). Unfortunately, this is problematic in many real-world applications where consistency is a fundamental system requirement.
GIST: Greedy Independent Set Thresholding for Diverse Data Summarization
Fahrbach, Matthew, Ramalingam, Srikumar, Zadimoghaddam, Morteza, Ahmadian, Sara, Citovsky, Gui, DeSalvo, Giulia
Subset selection is a challenging optimization problem with a wide variety of applications in machine learning, including feature selection, recommender systems, news aggregation, drug discovery, data summarization, and designing pretraining sets for large language models (Anil et al., 2023). Data sampling in particular is a salient problem due to unprecedented and continuous data collection. For example, LiDAR and imaging devices in one self-driving vehicle can easily capture ~80 terabytes of data per day (Kazhamiaka et al., 2021). In most subset selection tasks, we rely on the weight (or utility) of the objects to rank one over the other, and also to avoid selecting duplicate or near-duplicate objects. If we select a small subset, then we also want to ensure that the selected subset is a good representation of the original set. These utility, diversity, and coverage criteria can be expressed through objective functions, and the interesting research lies in developing efficient algorithms with strong approximation guarantees. The underlying machinery used in constrained subset selection algorithms shares many similarities with techniques from other areas of combinatorial optimization such as submodular maximization, -center clustering, and convex hull approximations. In this work, we study the problem of selecting a set of points in a metric space that maximizes an objective that combines their utility and a minimum pairwise-distance diversity measure.
Fully Dynamic Submodular Maximization over Matroids
Dütting, Paul, Fusco, Federico, Lattanzi, Silvio, Norouzi-Fard, Ashkan, Zadimoghaddam, Morteza
Maximizing monotone submodular functions under a matroid constraint is a classic algorithmic problem with multiple applications in data mining and machine learning. We study this classic problem in the fully dynamic setting, where elements can be both inserted and deleted in real-time. Our main result is a randomized algorithm that maintains an efficient data structure with an $\tilde{O}(k^2)$ amortized update time (in the number of additions and deletions) and yields a $4$-approximate solution, where $k$ is the rank of the matroid.
Submodular Streaming in All its Glory: Tight Approximation, Minimum Memory and Low Adaptive Complexity
Kazemi, Ehsan, Mitrovic, Marko, Zadimoghaddam, Morteza, Lattanzi, Silvio, Karbasi, Amin
Streaming algorithms are generally judged by the quality of their solution, memory footprint, and computational complexity. In this paper, we study the problem of maximizing a monotone submodular function in the streaming setting with a cardinality constraint $k$. We first propose Sieve-Streaming++, which requires just one pass over the data, keeps only $O(k)$ elements and achieves the tight $(1/2)$-approximation guarantee. The best previously known streaming algorithms either achieve a suboptimal $(1/4)$-approximation with $\Theta(k)$ memory or the optimal $(1/2)$-approximation with $O(k\log k)$ memory. Next, we show that by buffering a small fraction of the stream and applying a careful filtering procedure, one can heavily reduce the number of adaptive computational rounds, thus substantially lowering the computational complexity of Sieve-Streaming++. We then generalize our results to the more challenging multi-source streaming setting. We show how one can achieve the tight $(1/2)$-approximation guarantee with $O(k)$ shared memory while minimizing not only the required rounds of computations but also the total number of communicated bits. Finally, we demonstrate the efficiency of our algorithms on real-world data summarization tasks for multi-source streams of tweets and of YouTube videos.
Data Summarization at Scale: A Two-Stage Submodular Approach
Mitrovic, Marko, Kazemi, Ehsan, Zadimoghaddam, Morteza, Karbasi, Amin
The sheer scale of modern datasets has resulted in a dire need for summarization techniques that identify representative elements in a dataset. Fortunately, the vast majority of data summarization tasks satisfy an intuitive diminishing returns condition known as submodularity, which allows us to find nearly-optimal solutions in linear time. We focus on a two-stage submodular framework where the goal is to use some given training functions to reduce the ground set so that optimizing new functions (drawn from the same distribution) over the reduced set provides almost as much value as optimizing them over the entire ground set. In this paper, we develop the first streaming and distributed solutions to this problem. In addition to providing strong theoretical guarantees, we demonstrate both the utility and efficiency of our algorithms on real-world tasks including image summarization and ride-share optimization.
Scalable Feature Selection via Distributed Diversity Maximization
Zadeh, Sepehr Abbasi (Sharif University of Technology) | Ghadiri, Mehrdad (Sharif University of Technology) | Mirrokni, Vahab (Google Research) | Zadimoghaddam, Morteza (Google Research)
Feature selection is a fundamental problem in machine learning and data mining. The majority of feature selection algorithms are designed for running on a single machine (centralized setting) and they are less applicable to very large datasets. Although there are some distributed methods to tackle this problem, most of them are distributing the data horizontally which are not suitable for datasets with a large number of features and few number of instances. Thus, in this paper, we introduce a novel vertically distributable feature selection method in order to speed up this process and be able to handle very large datasets in a scalable manner. In general, feature selection methods aim at selecting relevant and non-redundant features (Minimum Redundancy and Maximum Relevance). It is much harder to consider redundancy in a vertically distributed setting than a centralized setting since there is no global access to the whole data. To the best of our knowledge, this is the first attempt toward solving the feature selection problem with a vertically distributed filter method which handles the redundancy with consistently comparable results with centralized methods. In this paper, we formalize the feature selection problem as a diversity maximization problem by introducing a mutual-information-based metric distance on the features. We show the effectiveness of our method by performing an extensive empirical study. In particular, we show that our distributed method outperforms state-of-the-art centralized feature selection algorithms on a variety of datasets. From a theoretical point of view, we have proved that the used greedy algorithm in our method achieves an approximation factor of 1/4 for the diversity maximization problem in a distributed setting with high probability. Furthermore, we improve this to 8/25 expected approximation using multiplicity in our distribution.
Fast Distributed Submodular Cover: Public-Private Data Summarization
Mirzasoleiman, Baharan, Zadimoghaddam, Morteza, Karbasi, Amin
In this paper, we introduce the public-private framework of data summarization motivated by privacy concerns in personalized recommender systems and online social services. Such systems have usually access to massive data generated by a large pool of users. A major fraction of the data is public and is visible to (and can be used for) all users. However, each user can also contribute some private data that should not be shared with other users to ensure her privacy. The goal is to provide a succinct summary of massive dataset, ideally as small as possible, from which customized summaries can be built for each user, i.e. it can contain elements from the public data (for diversity) and users' private data (for personalization). To formalize the above challenge, we assume that the scoring function according to which a user evaluates the utility of her summary satisfies submodularity, a widely used notion in data summarization applications. Thus, we model the data summarization targeted to each user as an instance of a submodular cover problem. However, when the data is massive it is infeasible to use the centralized greedy algorithm to find a customized summary even for a single user. Moreover, for a large pool of users, it is too time consuming to find such summaries separately. Instead, we develop a fast distributed algorithm for submodular cover, FASTCOVER, that provides a succinct summary in one shot and for all users. We show that the solution provided by FASTCOVER is competitive with that of the centralized algorithm with the number of rounds that is exponentially smaller than state of the art results. Moreover, we have implemented FASTCOVER with Spark to demonstrate its practical performance on a number of concrete applications, including personalized location recommendation, personalized movie recommendation, and dominating set on tens of millions of data points and varying number of users.
Horizontally Scalable Submodular Maximization
Lucic, Mario, Bachem, Olivier, Zadimoghaddam, Morteza, Krause, Andreas
A variety of large-scale machine learning problems can be cast as instances of constrained submodular maximization. Existing approaches for distributed submodular maximization have a critical drawback: The capacity - number of instances that can fit in memory - must grow with the data set size. In practice, while one can provision many machines, the capacity of each machine is limited by physical constraints. We propose a truly scalable approach for distributed submodular maximization under fixed capacity. The proposed framework applies to a broad class of algorithms and constraints and provides theoretical guarantees on the approximation factor for any available capacity. We empirically evaluate the proposed algorithm on a variety of data sets and demonstrate that it achieves performance competitive with the centralized greedy solution.
Optimal Coalition Structure Generation in Cooperative Graph Games
Bachrach, Yoram (Microsoft Research Cambridge) | Kohli, Pushmeet (Microsoft Research Cambridge) | Kolmogorov, Vladimir (nstitute of Science and Technology) | Zadimoghaddam, Morteza (Massachusetts Institute of Technology)
Representation languages for coalitional games are a key research area in algorithmic game theory. There is an inherent tradeoff between how general a language is, allowing it to capture more elaborate games, and how hard it is computationally to optimize and solve such games. One prominent such language is the simple yet expressive Weighted Graph Games (WGGs) representation (Deng and Papadimitriou, 1994), which maintains knowledge about synergies between agents in the form of an edge weighted graph. We consider the problem of finding the optimal coalition structure in WGGs. The agents in such games are vertices in a graph, and the value of a coalition is the sum of the weights of the edges present between coalition members. The optimal coalition structure is a partition of the agents to coalitions, that maximizes the sum of utilities obtained by the coalitions. We show that finding the optimal coalition structure is not only hard for general graphs, but is also intractable for restricted families such as planar graphs which are amenable for many other combinatorial problems. We then provide algorithms with constant factor approximations for planar, minor-free and bounded degree graphs.