Goto

Collaborating Authors

 algorithm 4


Penalty-Based First-Order Methods for Bilevel Optimization with Minimax and Constrained Lower-Level Problems

arXiv.org Machine Learning

We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax optimization separately, existing methods mainly focus on bilevel optimization with lower-level minimization problems, often under strong convexity assumptions, and are not directly applicable to the minimax lower-level setting considered here. To address this gap, we develop penalty-based first-order methods for bilevel minimax optimization without requiring strong convexity of the lower-level problem. In the deterministic setting, we establish that the proposed method finds an $ε$-KKT point with $\tilde{O}(ε^{-4})$ oracle complexity. We further show that bilevel problems with convex constrained lower-level minimization can be reformulated as special cases of our framework via Lagrangian duality, leading to an $\tilde{O}(ε^{-4})$ complexity bound that improves upon the existing $\tilde{O}(ε^{-7})$ result. Finally, we extend our approach to the stochastic setting, where only stochastic gradient oracles are available, and prove that the proposed stochastic method finds a nearly $ε$-KKT point with $\tilde{O}(ε^{-9})$ oracle complexity.


Contents of main article and appendix

Neural Information Processing Systems

We start by fleshing out the connection between strong convexity and smoothness charted in Lemma 1: Lemma 1. If F is -strongly convex w.r.t.


facts

Neural Information Processing Systems

Let f be a non-negative submodular function on [n] that is bounded above by 1. First, suppose that Xi are monotone increasing. Construct a sequence X0i as follows. If i / I then set X0i = X0i 1. If i I then set X0i = X0i 1 (Xi \Xi 1). For the monotone decreasing case, consider the submodular function g(X) = f([n] X) and set Yi = [n] Xi.


Faster approximate subgraph counts with privacy

Neural Information Processing Systems

One of the most common problems studied in the context of differential privacy for graph data is counting the number of non-induced embeddings of a subgraph in a given graph. These counts have very high global sensitivity. Therefore, adding noise based on powerful alternative techniques, such as smooth sensitivity and higher-order local sensitivity have been shown to give significantly better accuracy. However, all these alternatives to global sensitivity become computationally very expensive, and to date efficient polynomial time algorithms are known only for few selected subgraphs, such as triangles, k-triangles, and k-stars. In this paper, we show that good approximations to these sensitivity metrics can be still used to get private algorithms. Using this approach, we much faster algorithms for privately counting the number of triangles in real-world social networks, which can be easily parallelized. We also give a private polynomial time algorithm for counting any constant size subgraph using less noise than the global sensitivity; we show this can be improved significantly for counting paths in special classes of graphs.



Triple Eagle: Simple, Fast and Practical Budget-Feasible Mechanisms

Neural Information Processing Systems

We revisit the classical problem of designing Budget-Feasible Mechanisms (BFMs) for submodular valuation functions, which has been extensively studied since the seminal paper of Singer [FOCS'10] due to its wide applications in crowdsourcing and social marketing. We propose TripleEagle, a novel algorithmic framework for designing BFMs, based on which we present several simple yet effective BFMs that achieve better approximation ratios than the state-of-the-art work for both monotone and non-monotone submodular valuation functions. Moreover, our BFMs are the first in the literature to achieve linear complexities while ensuring obvious strategyproofness, making them more practical than the previous BFMs. We conduct extensive experiments to evaluate the empirical performance of our BFMs, and the experimental results strongly demonstrate the efficiency and effectiveness of our approach.


Learning on the Edge: Online Learning with Stochastic Feedback Graphs

Neural Information Processing Systems

The framework of feedback graphs is a generalization of sequential decisionmaking with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, following a distribution similar to the classical Erdős-Rényi model. Specifically, in each round every edge in the graph is either realized or not with a distinct probability for each edge.


Supplementary proofs from Section 2

Neural Information Processing Systems

We begin with a simple lemma showing that the values of the levels are monotone: Lemma A.1. First, we note that the second part of the lemma holds by lines 15-16. Let zil and zih be the value of zland zhin Algorithm 2 on line 9 on window i. There are two cases, depending on whether an element e? was added to the solutions or not. Suppose no element e? was added to the solution. Then all the levels remain the same.


PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits

Neural Information Processing Systems

In sparse linear bandits, a learning agent sequentially selects an action and receive reward feedback, and the reward function depends linearly on a few coordinates of the covariates of the actions. This has applications in many real-world sequential decision making problems. In this paper, we propose a simple and computationally efficient sparse linear estimation method called POPART that enjoys a tighter ℓ1 recovery guarantee compared to Lasso (Tibshirani, 1996) in many problems. Our bound naturally motivates an experimental design criterion that is convex and thus computationally efficient to solve. Based on our novel estimator and design criterion, we derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art (Hao et al., 2020), especially w.r.t. the geometry of the given action set. Finally, we prove a matching lower bound for sparse linear bandits in the data-poor regime, which closes the gap between upper and lower bounds in prior work.


where the last inequality follows from the fact that Uij 1. Also, for any i [n ] and j [k], we have xi bµj

Neural Information Processing Systems

To prove Lemma 2 we start by proving a few inequalities. Since Ais an ( 1, 2,Q)-solver, using Definition 4 and Taylor's expansion, we get for any i [n] and j [k], In this section we present and prove a few auxiliary results which will be used in the proofs our main results. We start with the following standard concentration inequalities. R2, (32) if n clog(1/δ)2, where c > 0 is some absolute constant. The following locality lemma states that the fuzzy k-means function is strictly increasing. Lemma 5. Let (X,P?) be a clustering instance, where P? refers to the optimal solution for the fuzzy k-mean problem (namely, minimizes the objective in (2)). Output: bµj 1: Initialize S φ. 2: for s= 1,2,...,mdo 3: Sample iuniformly at random from [n] and update S S {i}. Next, we analyze the performance of Algorithm 6, which estimates the center of a given cluster using a set of randomly sampled elements. Note that this algorithm is used as a sub-routine in Algorithm 1. Lemma 6 (Estimate of mean using uniform sampling). Let (X,P) be a consistent center-based clustering instance, and let δ (0,1).