Goto

Collaborating Authors

 data owner






How to Securely Shuffle? A survey about Secure Shufflers for privacy-preserving computations

Damie, Marc, Hahn, Florian, Peter, Andreas, Ramon, Jan

arXiv.org Artificial Intelligence

Ishai et al. (FOCS'06) introduced secure shuffling as an efficient building block for private data aggregation. Recently, the field of differential privacy has revived interest in secure shufflers by highlighting the privacy amplification they can provide in various computations. Although several works argue for the utility of secure shufflers, they often treat them as black boxes; overlooking the practical vulnerabilities and performance trade-offs of existing implementations. This leaves a central question open: what makes a good secure shuffler? This survey addresses that question by identifying, categorizing, and comparing 26 secure protocols that realize the necessary shuffling functionality. To enable a meaningful comparison, we adapt and unify existing security definitions into a consistent set of properties. We also present an overview of privacy-preserving technologies that rely on secure shufflers, offer practical guidelines for selecting appropriate protocols, and outline promising directions for future work.


Secure Sparse Matrix Multiplications and their Applications to Privacy-Preserving Machine Learning

Damie, Marc, Hahn, Florian, Peter, Andreas, Ramon, Jan

arXiv.org Artificial Intelligence

To preserve privacy, multi-party computation (MPC) enables executing Machine Learning (ML) algorithms on secret-shared or encrypted data. However, existing MPC frameworks are not optimized for sparse data. This makes them unsuitable for ML applications involving sparse data, e.g., recommender systems or genomics. Even in plaintext, such applications involve high-dimensional sparse data, that cannot be processed without sparsity-related optimizations due to prohibitively large memory requirements. Since matrix multiplication is central in ML algorithms, we propose MPC algorithms to multiply secret sparse matrices. On the one hand, our algorithms avoid the memory issues of the "dense" data representation of classic secure matrix multiplication algorithms. On the other hand, our algorithms can significantly reduce communication costs (some experiments show a factor 1000) for realistic problem sizes. We validate our algorithms in two ML applications in which existing protocols are impractical. An important question when developing MPC algorithms is what assumptions can be made. In our case, if the number of non-zeros in a row is a sensitive piece of information then a short runtime may reveal that the number of non-zeros is small. Existing approaches make relatively simple assumptions, e.g., that there is a universal upper bound to the number of non-zeros in a row. This often doesn't align with statistical reality, in a lot of sparse datasets the amount of data per instance satisfies a power law. We propose an approach which allows adopting a safe upper bound on the distribution of non-zeros in rows/columns of sparse matrices.


A Gradient analysis

Neural Information Processing Systems

To better understand why our generated confounder noise can make the data unlearnable, we can also gain some insights according to optimization gradient. Empirically, if one image provides a large gradient in a backpropagation, this image has a lot of learnable knowledge, and vice versa. Figure 9 shows the accuracy curves of our method during the training epoch. Then we give a detailed discussion about this setting. To better understand this adaptive setting, we first illustrate the assumption on the data owner's The model trainer wishes to train a denoiser against the noise generated by the ConfounderGAN.



Accurate and Private Diagnosis of Rare Genetic Syndromes from Facial Images with Federated Deep Learning

Ünal, Ali Burak, Baykara, Cem Ata, Krawitz, Peter, Akgün, Mete

arXiv.org Artificial Intelligence

Machine learning has shown promise in facial dysmorphology, where characteristic facial features provide diagnostic clues for rare genetic disorders. GestaltMatcher, a leading framework in this field, has demonstrated clinical utility across multiple studies, but its reliance on centralized datasets limits further development, as patient data are siloed across institutions and subject to strict privacy regulations. We introduce a federated GestaltMatcher service based on a cross-silo horizontal federated learning framework, which allows hospitals to collaboratively train a global ensemble feature extractor without sharing patient images. Patient data are mapped into a shared latent space, and a privacy-preserving kernel matrix computation framework enables syndrome inference and discovery while safeguarding confidentiality. New participants can directly benefit from and contribute to the system by adopting the global feature extractor and kernel configuration from previous training rounds. Experiments show that the federated service retains over 90% of centralized performance and remains robust to both varying silo numbers and heterogeneous data distributions.


zkUnlearner: A Zero-Knowledge Framework for Verifiable Unlearning with Multi-Granularity and Forgery-Resistance

Wang, Nan, Wu, Nan, Hui, Xiangyu, Wang, Jiafan, Yuan, Xin

arXiv.org Artificial Intelligence

As the demand for exercising the "right to be forgotten" grows, the need for verifiable machine unlearning has become increasingly evident to ensure both transparency and accountability. We present {\em zkUnlearner}, the first zero-knowledge framework for verifiable machine unlearning, specifically designed to support {\em multi-granularity} and {\em forgery-resistance}. First, we propose a general computational model that employs a {\em bit-masking} technique to enable the {\em selectivity} of existing zero-knowledge proofs of training for gradient descent algorithms. This innovation enables not only traditional {\em sample-level} unlearning but also more advanced {\em feature-level} and {\em class-level} unlearning. Our model can be translated to arithmetic circuits, ensuring compatibility with a broad range of zero-knowledge proof systems. Furthermore, our approach overcomes key limitations of existing methods in both efficiency and privacy. Second, forging attacks present a serious threat to the reliability of unlearning. Specifically, in Stochastic Gradient Descent optimization, gradients from unlearned data, or from minibatches containing it, can be forged using alternative data samples or minibatches that exclude it. We propose the first effective strategies to resist state-of-the-art forging attacks. Finally, we benchmark a zkSNARK-based instantiation of our framework and perform comprehensive performance evaluations to validate its practicality.