Goto

Collaborating Authors

 allen-zhu


Regression under demographic parity constraints via unlabeled post-processing

Neural Information Processing Systems

We address the problem of performing regression while ensuring demographic parity, even without access to sensitive attributes during inference. We present a general-purpose post-processing algorithm that, using accurate estimates of the regression function and a sensitive attribute predictor, generates predictions that meet the demographic parity constraint. Our method involves discretization and stochastic minimization of a smooth convex function.


SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator

Neural Information Processing Systems

We provide a few error-bound results on its convergence rates. Specially, we prove that theSPIDER-SFO algorithm achieves a gradient computation cost of O min(n1/2 2, 3) to find an -approximate first-order stationary point. In addition, we prove thatSPIDER-SFO nearly matches the algorithmic lower bound for finding stationary point under the gradient Lipschitz assumption in the finite-sum setting.



Regression under demographic parity constraints via unlabeled post-processing

Neural Information Processing Systems

We address the problem of performing regression while ensuring demographic parity, even without access to sensitive attributes during inference. We present a general-purpose post-processing algorithm that, using accurate estimates of the regression function and a sensitive attribute predictor, generates predictions that meet the demographic parity constraint. Our method involves discretization and stochastic minimization of a smooth convex function.


Regression under demographic parity constraints via unlabeled post-processing

arXiv.org Machine Learning

We address the problem of performing regression while ensuring demographic parity, even without access to sensitive attributes during inference. We present a general-purpose post-processing algorithm that, using accurate estimates of the regression function and a sensitive attribute predictor, generates predictions that meet the demographic parity constraint. Our method involves discretization and stochastic minimization of a smooth convex function. It is suitable for online post-processing and multi-class classification tasks only involving unlabeled data for the post-processing. Unlike prior methods, our approach is fully theory-driven. We require precise control over the gradient norm of the convex function, and thus, we rely on more advanced techniques than standard stochastic gradient descent. Our algorithm is backed by finite-sample analysis and post-processing bounds, with experimental results validating our theoretical findings.


Efficient Continual Finite-Sum Minimization

arXiv.org Artificial Intelligence

Given a sequence of functions $f_1,\ldots,f_n$ with $f_i:\mathcal{D}\mapsto \mathbb{R}$, finite-sum minimization seeks a point ${x}^\star \in \mathcal{D}$ minimizing $\sum_{j=1}^n f_j(x)/n$. In this work, we propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization, that asks for a sequence of points ${x}_1^\star,\ldots,{x}_n^\star \in \mathcal{D}$ such that each ${x}^\star_i \in \mathcal{D}$ minimizes the prefix-sum $\sum_{j=1}^if_j(x)/i$. Assuming that each prefix-sum is strongly convex, we develop a first-order continual stochastic variance reduction gradient method ($\mathrm{CSVRG}$) producing an $\epsilon$-optimal sequence with $\mathcal{\tilde{O}}(n/\epsilon^{1/3} + 1/\sqrt{\epsilon})$ overall first-order oracles (FO). An FO corresponds to the computation of a single gradient $\nabla f_j(x)$ at a given $x \in \mathcal{D}$ for some $j \in [n]$. Our approach significantly improves upon the $\mathcal{O}(n/\epsilon)$ FOs that $\mathrm{StochasticGradientDescent}$ requires and the $\mathcal{O}(n^2 \log (1/\epsilon))$ FOs that state-of-the-art variance reduction methods such as $\mathrm{Katyusha}$ require. We also prove that there is no natural first-order method with $\mathcal{O}\left(n/\epsilon^\alpha\right)$ gradient complexity for $\alpha < 1/4$, establishing that the first-order complexity of our method is nearly tight.


Boosting First-Order Methods by Shifting Objective: New Schemes with Faster Worst-Case Rates

arXiv.org Machine Learning

We propose a new methodology to design first-order methods for unconstrained strongly convex problems. Specifically, instead of tackling the original objective directly, we construct a shifted objective function that has the same minimizer as the original objective and encodes both the smoothness and strong convexity of the original objective in an interpolation condition. We then propose an algorithmic template for tackling the shifted objective, which can exploit such a condition. Following this template, we derive several new accelerated schemes for problems that are equipped with various first-order oracles and show that the interpolation condition allows us to vastly simplify and tighten the analysis of the derived methods. In particular, all the derived methods have faster worst-case convergence rates than their existing counterparts. Experiments on machine learning tasks are conducted to evaluate the new methods.


Tight Lower Complexity Bounds for Strongly Convex Finite-Sum Optimization

arXiv.org Machine Learning

Finite-sum optimization plays an important role in the area of machine learning, and hence has triggered a surge of interest in recent years. To address this optimization problem, various randomized incremental gradient methods have been proposed with guaranteed upper and lower complexity bounds for their convergence. Nonetheless, these lower bounds rely on certain conditions: deterministic optimization algorithm, or fixed probability distribution for the selection of component functions. Meanwhile, some lower bounds even do not match the upper bounds of the best known methods in certain cases. To break these limitations, we derive tight lower complexity bounds of randomized incremental gradient methods, including SAG, SAGA, SVRG, and SARAH, for two typical cases of finite-sum optimization. Specifically, our results tightly match the upper complexity of Katyusha when each component function is strongly convex and smooth, and tightly match the upper complexity of SDCA without duality and of KatyushaX when the finite-sum function is strongly convex and the component functions are average smooth.


A General Analysis Framework of Lower Complexity Bounds for Finite-Sum Optimization

arXiv.org Machine Learning

This paper studies the lower bound complexity for the optimization problem whose objective function is the average of $n$ individual smooth convex functions. We consider the algorithm which gets access to gradient and proximal oracle for each individual component. For the strongly-convex case, we prove such an algorithm can not reach an $\varepsilon$-suboptimal point in fewer than $\Omega((n+\sqrt{\kappa n})\log(1/\varepsilon))$ iterations, where $\kappa$ is the condition number of the objective function. This lower bound is tighter than previous results and perfectly matches the upper bound of the existing proximal incremental first-order oracle algorithm Point-SAGA. We develop a novel construction to show the above result, which partitions the tridiagonal matrix of classical examples into $n$ groups. This construction is friendly to the analysis of proximal oracle and also could be used to general convex and average smooth cases naturally.


The Complexity of Making the Gradient Small in Stochastic Convex Optimization

arXiv.org Machine Learning

We give nearly matching upper and lower bounds on the oracle complexity of finding $\epsilon$-stationary points ($\| \nabla F(x) \| \leq\epsilon$) in stochastic convex optimization. We jointly analyze the oracle complexity in both the local stochastic oracle model and the global oracle (or, statistical learning) model. This allows us to decompose the complexity of finding near-stationary points into optimization complexity and sample complexity, and reveals some surprising differences between the complexity of stochastic optimization versus learning. Notably, we show that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on smoothness is necessary in the local stochastic oracle model. In other words, the separation in complexity between the two models can be exponential, and that the folklore understanding that smoothness is required to find stationary points is only weakly true for statistical learning. Our upper bounds are based on extensions of a recent "recursive regularization" technique proposed by Allen-Zhu (2018). We show how to extend the technique to achieve near-optimal rates, and in particular show how to leverage the extra information available in the global oracle model. Our algorithm for the global model can be implemented efficiently through finite sum methods, and suggests an interesting new computational-statistical tradeoff.