Ying, Yiming
Generalization Analysis for Contrastive Representation Learning
Lei, Yunwen, Yang, Tianbao, Ying, Yiming, Zhou, Ding-Xuan
Recently, contrastive learning has found impressive success in advancing the state of the art in solving various machine learning tasks. However, the existing generalization analysis is very limited or even not meaningful. In particular, the existing generalization error bounds depend linearly on the number $k$ of negative examples while it was widely shown in practice that choosing a large $k$ is necessary to guarantee good generalization of contrastive learning in downstream tasks. In this paper, we establish novel generalization bounds for contrastive learning which do not depend on $k$, up to logarithmic terms. Our analysis uses structural results on empirical covering numbers and Rademacher complexities to exploit the Lipschitz continuity of loss functions. For self-bounding Lipschitz loss functions, we further improve our results by developing optimistic bounds which imply fast rates in a low noise condition. We apply our results to learning with both linear representation and nonlinear representation by deep neural networks, for both of which we derive Rademacher complexity bounds to get improved generalization bounds.
Minimax AUC Fairness: Efficient Algorithm with Provable Convergence
Yang, Zhenhuan, Ko, Yan Lok, Varshney, Kush R., Ying, Yiming
The use of machine learning models in consequential decision making often exacerbates societal inequity, in particular yielding disparate impact on members of marginalized groups defined by race and gender. The area under the ROC curve (AUC) is widely used to evaluate the performance of a scoring function in machine learning, but is studied in algorithmic fairness less than other performance metrics. Due to the pairwise nature of the AUC, defining an AUC-based group fairness metric is pairwise-dependent and may involve both \emph{intra-group} and \emph{inter-group} AUCs. Importantly, considering only one category of AUCs is not sufficient to mitigate unfairness in AUC optimization. In this paper, we propose a minimax learning and bias mitigation framework that incorporates both intra-group and inter-group AUCs while maintaining utility. Based on this Rawlsian framework, we design an efficient stochastic optimization algorithm and prove its convergence to the minimum group-level AUC. We conduct numerical experiments on both synthetic and real-world datasets to validate the effectiveness of the minimax framework and the proposed optimization algorithm.
Differentially Private SGDA for Minimax Problems
Yang, Zhenhuan, Hu, Shu, Lei, Yunwen, Varshney, Kush R., Lyu, Siwei, Ying, Yiming
Stochastic gradient descent ascent (SGDA) and its variants have been the workhorse for solving minimax problems. However, in contrast to the well-studied stochastic gradient descent (SGD) with differential privacy (DP) constraints, there is little work on understanding the generalization (utility) of SGDA with DP constraints. In this paper, we use the algorithmic stability approach to establish the generalization (utility) of DP-SGDA in different settings. In particular, for the convex-concave setting, we prove that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases. To our best knowledge, this is the first-ever-known result for DP-SGDA in the non-smooth case. We further provide its utility analysis in the nonconvex-strongly-concave setting which is the first-ever-known result in terms of the primal population risk. The convergence and generalization results for this nonconvex setting are new even in the non-private setting. Finally, numerical experiments are conducted to demonstrate the effectiveness of DP-SGDA for both convex and nonconvex cases.
Sum of Ranked Range Loss for Supervised Learning
Hu, Shu, Ying, Yiming, Wang, Xin, Lyu, Siwei
In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output. Such cases occur in the aggregate loss, which combines individual losses of a learning model over each training sample, and in the individual loss for multi-label learning, which combines prediction scores over all class labels. In this work, we introduce the sum of ranked range (SoRR) as a general approach to form learning objectives. A ranked range is a consecutive sequence of sorted values of a set of real numbers. The minimization of SoRR is solved with the difference of convex algorithm (DCA). We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary/multi-class classification at the sample level and the TKML individual loss for multi-label/multi-class classification at the label level. A combination loss of AoRR and TKML is proposed as a new learning objective for improving the robustness of multi-label learning in the face of outliers in sample and labels alike. Our empirical results highlight the effectiveness of the proposed optimization frameworks and demonstrate the applicability of proposed losses using synthetic and real data sets.
Stability and Generalization of Stochastic Gradient Methods for Minimax Problems
Lei, Yunwen, Yang, Zhenhuan, Yang, Tianbao, Ying, Yiming
Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs), AUC maximization and robust estimation, to mention but a few. A substantial amount of studies are devoted to studying the convergence behavior of their stochastic gradient-type algorithms. In contrast, there is relatively little work on their generalization, i.e., how the learning models built from training examples would behave on test examples. In this paper, we provide a comprehensive generalization analysis of stochastic gradient methods for minimax problems under both convex-concave and nonconvex-nonconcave cases through the lens of algorithmic stability. We establish a quantitative connection between stability and several generalization measures both in expectation and with high probability. For the convex-concave setting, our stability analysis shows that stochastic gradient descent ascent attains optimal generalization bounds for both smooth and nonsmooth minimax problems. We also establish generalization bounds for both weakly-convex-weakly-concave and gradient-dominated problems.
Federated Deep AUC Maximization for Heterogeneous Data with a Constant Communication Complexity
Yuan, Zhuoning, Guo, Zhishuai, Xu, Yi, Ying, Yiming, Yang, Tianbao
Deep AUC (area under the ROC curve) Maximization (DAM) has attracted much attention recently due to its great potential for imbalanced data classification. However, the research on Federated Deep AUC Maximization (FDAM) is still limited. Compared with standard federated learning (FL) approaches that focus on decomposable minimization objectives, FDAM is more complicated due to its minimization objective is non-decomposable over individual examples. In this paper, we propose improved FDAM algorithms for heterogeneous data by solving the popular non-convex strongly-concave min-max formulation of DAM in a distributed fashion. A striking result of this paper is that the communication complexity of the proposed algorithm is a constant independent of the number of machines and also independent of the accuracy level, which improves an existing result by orders of magnitude. Of independent interest, the proposed algorithm can also be applied to a class of non-convex-strongly-concave min-max problems. The experiments have demonstrated the effectiveness of our FDAM algorithm on benchmark datasets, and on medical chest X-ray images from different organizations. Our experiment shows that the performance of FDAM using data from multiple hospitals can improve the AUC score on testing data from a single hospital for detecting life-threatening diseases based on chest radiographs.
Differentially Private SGD with Non-Smooth Loss
Wang, Puyu, Lei, Yunwen, Ying, Yiming, Zhang, Hai
In this paper, we are concerned with differentially private SGD algorithms in the setting of stochastic convex optimization (SCO). Most of existing work requires the loss to be Lipschitz continuous and strongly smooth, and the model parameter to be uniformly bounded. However, these assumptions are restrictive as many popular losses violate these conditions including the hinge loss for SVM, the absolute loss in robust regression, and even the least square loss in an unbounded domain. We significantly relax these restrictive assumptions and establish privacy and generalization (utility) guarantees for private SGD algorithms using output and gradient perturbations associated with non-smooth convex losses. Specifically, the loss function is relaxed to have $\alpha$-H\"{o}lder continuous gradient (referred to as $\alpha$-H\"{o}lder smoothness) which instantiates the Lipschitz continuity ($\alpha=0$) and strong smoothness ($\alpha=1$). We prove that noisy SGD with $\alpha$-H\"older smooth losses using gradient perturbation can guarantee $(\epsilon,\delta)$-differential privacy (DP) and attain optimal excess population risk $O\Big(\frac{\sqrt{d\log(1/\delta)}}{n\epsilon}+\frac{1}{\sqrt{n}}\Big)$, up to logarithmic terms, with gradient complexity (i.e. the total number of iterations) $T =O( n^{2-\alpha\over 1+\alpha}+ n).$ This shows an important trade-off between $\alpha$-H\"older smoothness of the loss and the computational complexity $T$ for private SGD with statistically optimal performance. In particular, our results indicate that $\alpha$-H\"older smoothness with $\alpha\ge {1/2}$ is sufficient to guarantee $(\epsilon,\delta)$-DP of noisy SGD algorithms while achieving optimal excess risk with linear gradient complexity $T = O(n).$
Stochastic Hard Thresholding Algorithms for AUC Maximization
Yang, Zhenhuan, Zhou, Baojian, Lei, Yunwen, Ying, Yiming
In this paper, we aim to develop stochastic hard thresholding algorithms for the important problem of AUC maximization in imbalanced classification. The main challenge is the pairwise loss involved in AUC maximization. We overcome this obstacle by reformulating the U-statistics objective function as an empirical risk minimization (ERM), from which a stochastic hard thresholding algorithm (\texttt{SHT-AUC}) is developed. To our best knowledge, this is the first attempt to provide stochastic hard thresholding algorithms for AUC maximization with a per-iteration cost $\O(b d)$ where $d$ and $b$ are the dimension of the data and the minibatch size, respectively. We show that the proposed algorithm enjoys the linear convergence rate up to a tolerance error. In particular, we show, if the data is generated from the Gaussian distribution, then its convergence becomes slower as the data gets more imbalanced. We conduct extensive experiments to show the efficiency and effectiveness of the proposed algorithms.
Learning by Minimizing the Sum of Ranked Range
Hu, Shu, Ying, Yiming, Wang, Xin, Lyu, Siwei
In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output. Such cases occur in the aggregate loss, which combines individual losses of a learning model over each training sample, and in the individual loss for multi-label learning, which combines prediction scores over all class labels. In this work, we introduce the sum of ranked range (SoRR) as a general approach to form learning objectives. A ranked range is a consecutive sequence of sorted values of a set of real numbers. The minimization of SoRR is solved with the difference of convex algorithm (DCA). We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification. Our empirical results highlight the effectiveness of the proposed optimization framework and demonstrate the applicability of proposed losses using synthetic and real datasets.
Online AUC Optimization for Sparse High-Dimensional Datasets
Zhou, Baojian, Ying, Yiming, Skiena, Steven
The Area Under the ROC Curve (AUC) is a widely used performance measure for imbalanced classification arising from many application domains where high-dimensional sparse data is abundant. In such cases, each $d$ dimensional sample has only $k$ non-zero features with $k \ll d$, and data arrives sequentially in a streaming form. Current online AUC optimization algorithms have high per-iteration cost $\mathcal{O}(d)$ and usually produce non-sparse solutions in general, and hence are not suitable for handling the data challenge mentioned above. In this paper, we aim to directly optimize the AUC score for high-dimensional sparse datasets under online learning setting and propose a new algorithm, \textsc{FTRL-AUC}. Our proposed algorithm can process data in an online fashion with a much cheaper per-iteration cost $\mathcal{O}(k)$, making it amenable for high-dimensional sparse streaming data analysis. Our new algorithmic design critically depends on a novel reformulation of the U-statistics AUC objective function as the empirical saddle point reformulation, and the innovative introduction of the "lazy update" rule so that the per-iteration complexity is dramatically reduced from $\mathcal{O}(d)$ to $\mathcal{O}(k)$. Furthermore, \textsc{FTRL-AUC} can inherently capture sparsity more effectively by applying a generalized Follow-The-Regularized-Leader (FTRL) framework. Experiments on real-world datasets demonstrate that \textsc{FTRL-AUC} significantly improves both run time and model sparsity while achieving competitive AUC scores compared with the state-of-the-art methods. Comparison with the online learning method for logistic loss demonstrates that \textsc{FTRL-AUC} achieves higher AUC scores especially when datasets are imbalanced.