Goto

Collaborating Authors

 Statistical Learning


Gaussian Membership Inference Privacy

Neural Information Processing Systems

We propose a novel and practical privacy notion called f-Membership Inference Privacy (f-MIP), which explicitly considers the capabilities of realistic adversaries under the membership inference attack threat model. Consequently, f-MIP offers interpretable privacy guarantees and improved utility (e.g., better classification accuracy). In particular, we derive a parametric family of f-MIP guarantees that we refer to as ยต-Gaussian Membership Inference Privacy (ยต-GMIP) by theoretically analyzing likelihood ratio-based membership inference attacks on stochastic gradient descent (SGD). Our analysis highlights that models trained with standard SGD already offer an elementary level of MIP. Additionally, we show how f-MIP can be amplified by adding noise to gradient updates.


Learning Functional Transduction: S.I. Contents

Neural Information Processing Systems

We propose below the proofs of the results presented in the main text. Most of the arguments are adapted from the development proposed in (Zhang, 2013) which goes beyond real or complex-valued RKBS developed in (Zhang et al., 2009; Song et al., 2013) to develop the notion of vector-valued RKBS. In addition, we note that assumptions regarding the properties of the RKBS of interests such as uniform Frรฉchet differentiability and uniform convexity have been further relaxed in other works (Xu and Ye, 2019; Lin et al., 2022) but are here sufficient for our discussion since they guarantee the unicity of a semi-inner product x.,.yB compatible with the norm ||.||B (Giles, 1967). S.1.1 Theoretical results Theorem 1 Theorem 1 gathers for the sake of compactness the definition of a vector-valued reproducing kernel Banach space with the properties of existence and unicity of the kernel K. Proof. For any v PV and u PU, the mapping Oรžร‘ xOpvq,uyU is a bounded linear form in LpBq.



Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift

Neural Information Processing Systems

Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.


k-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy

Neural Information Processing Systems

We propose a new initialization scheme for the k-median problem in the general metric space (e.g., discrete space induced by graphs), based on the construction of metric embedding tree structure of the data. We propose a novel and efficient search algorithm which finds initial centers that can be used subsequently for the local search algorithm. The so-called HST initialization method can produce initial centers achieving lower error than those from another popular method k-median++, also with higher efficiency when k is not too small. Our HST initialization are then extended to the setting of differential privacy (DP) to generate private initial centers. We show that the error of applying DP local search followed by our private HST initialization improves prior results on the approximation error, and approaches the lower bound within a small factor. Experiments demonstrate the effectiveness of our proposed methods.



BanditPAM++: Faster k-medoids Clustering

Neural Information Processing Systems

Clustering is a fundamental task in data science with wide-ranging applications. In k-medoids clustering, cluster centers must be actual datapoints and arbitrary distance metrics may be used; these features allow for greater interpretability of the cluster centers and the clustering of exotic objects in k-medoids clustering, respectively.


BanditPAM++: Faster k-medoids Clustering

Neural Information Processing Systems

Clustering is a fundamental task in data science with wide-ranging applications. In k-medoids clustering, cluster centers must be actual datapoints and arbitrary distance metrics may be used; these features allow for greater interpretability of the cluster centers and the clustering of exotic objects in k-medoids clustering, respectively.