Goto

Collaborating Authors

 Optimization


Fast Rank-1 Lattice Targeted Sampling for Black-box Optimization

Neural Information Processing Systems

Black-box optimization has gained great attention for its success in recent applications. However, scaling up to high-dimensional problems with good query efficiency remains challenging. This paper proposes a novel Rank-1 Lattice Targeted Sampling (RLTS) technique to address this issue. Our RLTS benefits from random rank-1 lattice Quasi-Monte Carlo, which enables us to perform fast local exact Gaussian processes (GP) training and inference with O(nlogn)complexity w.r.t.




Appendix Potential Negative Societal Impacts

Neural Information Processing Systems

C.3 Other Differences Besides the above discussion, there are some other differences between Daniely [12] and our work. First, they analyze SGD, and we analyze a constrained optimization problem and projected SGD. This may be the reason why we can get a stronger bound on width. In the experiments in Section 5, we observe that SGD performs badly when the width is small (see the first left column in (b), Figure 4). Therefore, we suspect an algorithmic change is needed to train narrow nets with such width (due to the training difficulty), and we indeed propose a new method to train narrow nets. Second, they consider binary {+1, 1}dataset, while our results apply to arbitrary labels. In addition, their proof seems to be highly dependent on the fact that the labels are {+1, 1}, and seems hard to generalize to general labels.


4c4c937b67cc8d785cea1e42ccea185c-Supplemental.pdf

Neural Information Processing Systems

Proof of Proposition 1. Due to Jensen's inequality and the fact that, by assumption, the distribution of human predictions P(h|x) is not a point-mass, it holds that Eh[`(h(x),y) |x] > `(ยตh(x),y). Proof of Theorem 3. We first provide the proof of the unconstrained case. Note that the above problem is a linear program and it decouples with respect to x. Therefore, for each x, the optimal solution is clearly given by: ฯ€ m(d= 1 |x) = 1 if Ey|x[`(m(x),y) Eh|x[`(h,y)]] >0 0 otherwise Next, we provide the proof of the constrained case. To this aim, we consider the dual formulation of the optimization problem, where we only introduce a Lagrangian multiplier ฯ„P,b for the first constraint, i.e., maximize Ex ฯ€(x) Ey,h|x[`(h,y)] Ey|x[`(m(x),y)] + Ex [ฯ„P,b(ฯ€(x) b)] (13) subject to 0 ฯ€(x) 1 x X. (14) 13 The inner minimization problem can be solved using the similar argument for the unconstrained case.


On Riemannian Optimization over Positive Definite Matrices with the Bures-Wasserstein Geometry

Neural Information Processing Systems

In this paper, we comparatively analyze the Bures-Wasserstein (BW) geometry with the popular Affine-Invariant (AI) geometry for Riemannian optimization on the symmetric positive definite (SPD) matrix manifold. Our study begins with an observation that the BW metric has a linear dependence on SPD matrices in contrast to the quadratic dependence of the AI metric. We build on this to show that the BW metric is a more suitable and robust choice for several Riemannian optimization problems over ill-conditioned SPD matrices. We show that the BW geometry has a non-negative curvature, which further improves convergence rates of algorithms over the non-positively curved AI geometry. Finally, we verify that several popular cost functions, which are known to be geodesic convex under the AI geometry, are also geodesic convex under the BW geometry. Extensive experiments on various applications support our findings.


Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls

Neural Information Processing Systems

We investigate the optimal design of experimental studies that have pre-treatment outcome data available. The average treatment effect is estimated as the difference between the weighted average outcomes of the treated and control units. A number of commonly used approaches fit this formulation, including the difference-inmeans estimator and a variety of synthetic-control techniques. We propose several methods for choosing the set of treated units in conjunction with the weights. Observing the NP-hardness of the problem, we introduce a mixed-integer programming formulation which selects both the treatment and control sets and unit weightings. We prove that these proposed approaches lead to qualitatively different experimental units being selected for treatment. We use simulations based on publicly available data from the USBureau of Labor Statistics that show improvements in terms of mean squared error and statistical power when compared to simple and commonly used alternatives such as randomized trials.


Towards Gradient-based Bilevel Optimization with Non-convex Followers and Beyond Risheng Liu1,2 Yaohua Liu1 Shangzhi Zeng3 Jin Zhang 4,5

Neural Information Processing Systems

In recent years, Bi-Level Optimization (BLO) techniques have received extensive attentions from both learning and vision communities. A variety of BLO models in complex and practical tasks are of non-convex follower structure in nature (a.k.a., without Lower-Level Convexity, LLC for short). However, this challenging class of BLOs is lack of developments on both efficient solution strategies and solid theoretical guarantees. In this work, we propose a new algorithmic framework, named Initialization Auxiliary and Pessimistic Trajectory Truncated Gradient Method (IAPTT-GM), to partially address the above issues. In particular, by introducing an auxiliary as initialization to guide the optimization dynamics and designing a pessimistic trajectory truncation operation, we construct a reliable approximate version of the original BLO in the absence of LLC hypothesis. Our theoretical investigations establish the convergence of solutions returned by IAPTT-GM towards those of the original BLO without LLC. As an additional bonus, we also theoretically justify the quality of our IAPTT-GM embedded with Nesterov's accelerated dynamics under LLC. The experimental results confirm both the convergence of our algorithm without LLC, and the theoretical findings under LLC.



Federated Compositional Deep AUCMaximization

Neural Information Processing Systems

Federated learning has attracted increasing attention due to the promise of balancing privacy and large-scale learning; numerous approaches have been proposed. However, most existing approaches focus on problems with balanced data, and prediction performance is far from satisfactory for many real-world applications where the number of samples in different classes is highly imbalanced. To address this challenging problem, we developed a novel federated learning method for imbalanced data by directly optimizing the area under curve (AUC) score. In particular, we formulate the AUC maximization problem as a federated compositional minimax optimization problem, develop a local stochastic compositional gradient descent ascent with momentum algorithm, and provide bounds on the computational and communication complexities of our algorithm. To the best of our knowledge, this is the first work to achieve such favorable theoretical results. Finally, extensive experimental results confirm the efficacy of our method.