Goto

Collaborating Authors

 Upadhyay, Jalaj


The Price of Privacy for Low-rank Factorization

Neural Information Processing Systems

In this paper, we study what price one has to pay to release \emph{differentially private low-rank factorization} of a matrix. We consider various settings that are close to the real world applications of low-rank factorization: (i) the manner in which matrices are updated (row by row or in an arbitrary manner), (ii) whether matrices are distributed or not, and (iii) how the output is produced (once at the end of all updates, also known as \emph{one-shot algorithms} or continually). Even though these settings are well studied without privacy, surprisingly, there are no private algorithm for these settings (except when a matrix is updated row by row). We present the first set of differentially private algorithms for all these settings. Our algorithms when private matrix is updated in an arbitrary manner promise differential privacy with respect to two stronger privacy guarantees than previously studied, use space and time \emph{comparable} to the non-private algorithm, and achieve \emph{optimal accuracy}. To complement our positive results, we also prove that the space required by our algorithms is optimal up to logarithmic factors. When data matrices are distributed over multiple servers, we give a non-interactive differentially private algorithm with communication cost independent of dimension. In concise, we give algorithms that incur {\em optimal cost across all parameters of interest}. We also perform experiments to verify that all our algorithms perform well in practice and outperform the best known algorithm until now for large range of parameters.


Differentially Private Robust Low-Rank Approximation

Neural Information Processing Systems

In this paper, we study the following robust low-rank matrix approximation problem: given a matrix $A \in \R^{n \times d}$, find a rank-$k$ matrix $B$, while satisfying differential privacy, such that $ \norm{ A - B }_p \leq \alpha \mathsf{OPT}_k(A) + \tau,$ where $\norm{ M }_p$ is the entry-wise $\ell_p$-norm and $\mathsf{OPT}_k(A):=\min_{\mathsf{rank}(X) \leq k} \norm{ A - X}_p$. It is well known that low-rank approximation w.r.t. entrywise $\ell_p$-norm, for $p \in [1,2)$, yields robustness to gross outliers in the data. We propose an algorithm that guarantees $\alpha=\widetilde{O}(k^2), \tau=\widetilde{O}(k^2(n+kd)/\varepsilon)$, runs in $\widetilde O((n+d)\poly~k)$ time and uses $O(k(n+d)\log k)$ space. We study extensions to the streaming setting where entries of the matrix arrive in an arbitrary order and output is produced at the very end or continually. We also study the related problem of differentially private robust principal component analysis (PCA), wherein we return a rank-$k$ projection matrix $\Pi$ such that $\norm{ A - A \Pi }_p \leq \alpha \mathsf{OPT}_k(A) + \tau.$


The Price of Privacy for Low-rank Factorization

Neural Information Processing Systems

In this paper, we study what price one has to pay to release \emph{differentially private low-rank factorization} of a matrix. We consider various settings that are close to the real world applications of low-rank factorization: (i) the manner in which matrices are updated (row by row or in an arbitrary manner), (ii) whether matrices are distributed or not, and (iii) how the output is produced (once at the end of all updates, also known as \emph{one-shot algorithms} or continually). Even though these settings are well studied without privacy, surprisingly, there are no private algorithm for these settings (except when a matrix is updated row by row). We present the first set of differentially private algorithms for all these settings. Our algorithms when private matrix is updated in an arbitrary manner promise differential privacy with respect to two stronger privacy guarantees than previously studied, use space and time \emph{comparable} to the non-private algorithm, and achieve \emph{optimal accuracy}. To complement our positive results, we also prove that the space required by our algorithms is optimal up to logarithmic factors. When data matrices are distributed over multiple servers, we give a non-interactive differentially private algorithm with communication cost independent of dimension. In concise, we give algorithms that incur {\em optimal cost across all parameters of interest}. We also perform experiments to verify that all our algorithms perform well in practice and outperform the best known algorithm until now for large range of parameters.