Goto

Collaborating Authors

 Aharoni, Ehud


Efficient Pruning for Machine Learning Under Homomorphic Encryption

arXiv.org Artificial Intelligence

Privacy-preserving machine learning (PPML) solutions are gaining widespread popularity. Among these, many rely on homomorphic encryption (HE) that offers confidentiality of the model and the data, but at the cost of large latency and memory requirements. Pruning neural network (NN) parameters improves latency and memory in plaintext ML but has little impact if directly applied to HE-based PPML. We introduce a framework called HE-PEx that comprises new pruning methods, on top of a packing technique called tile tensors, for reducing the latency and memory of PPML inference. HE-PEx uses permutations to prune additional ciphertexts, and expansion to recover inference loss. We demonstrate the effectiveness of our methods for pruning fully-connected and convolutional layers in NNs on PPML tasks, namely, image compression, denoising, and classification, with autoencoders, multilayer perceptrons (MLPs) and convolutional neural networks (CNNs). We implement and deploy our networks atop a framework called HElayers, which shows a 10-35% improvement in inference speed and a 17-35% decrease in memory requirement over the unpruned network, corresponding to 33-65% fewer ciphertexts, within a 2.5% degradation in inference accuracy over the unpruned network. Compared to the state-of-the-art pruning technique for PPML, our techniques generate networks with 70% fewer ciphertexts, on average, for the same degradation limit.


Power-Softmax: Towards Secure LLM Inference over Encrypted Data

arXiv.org Artificial Intelligence

Modern cryptographic methods for implementing privacy-preserving LLMs such as Homomorphic Encryption (HE) require the LLMs to have a polynomial form. Forming such a representation is challenging because Transformers include non-polynomial components, such as Softmax and layer normalization. Previous approaches have either directly approximated pre-trained models with large-degree polynomials, which are less efficient over HE, or replaced non-polynomial components with easier-to-approximate primitives before training, e.g., Softmax with pointwise attention. The latter approach might introduce scalability challenges. We present a new HE-friendly variant of self-attention that offers a stable form for training and is easy to approximate with polynomials for secure inference. Our work introduces the first polynomial LLMs with 32 layers and over a billion parameters, exceeding the size of previous models by more than tenfold. The resulting models demonstrate reasoning and in-context learning (ICL) capabilities comparable to standard transformers of the same size, representing a breakthrough in the field. Finally, we provide a detailed latency breakdown for each computation over encrypted data, paving the way for further optimization, and explore the differences in inductive bias between transformers relying on our HE-friendly variant and standard transformers. Our code is attached as a supplement.


HeLayers: A Tile Tensors Framework for Large Neural Networks on Encrypted Data

arXiv.org Artificial Intelligence

Privacy-preserving solutions enable companies to offload confidential data to third-party services while fulfilling their government regulations. To accomplish this, they leverage various cryptographic techniques such as Homomorphic Encryption (HE), which allows performing computation on encrypted data. Most HE schemes work in a SIMD fashion, and the data packing method can dramatically affect the running time and memory costs. Finding a packing method that leads to an optimal performant implementation is a hard task. We present a simple and intuitive framework that abstracts the packing decision for the user. We explain its underlying data structures and optimizer, and propose a novel algorithm for performing 2D convolution operations. We used this framework to implement an HE-friendly version of AlexNet, which runs in three minutes, several orders of magnitude faster than other state-of-the-art solutions that only use HE.


Hartigan's K-Means Versus Lloyd's K-Means — Is It Time for a Change?

AAAI Conferences

Hartigan's method for k-means clustering holds several potential advantages compared to the classical and prevalent optimization heuristic known as Lloyd's algorithm. E.g., it was recently shown that the set of local minima of Hartigan's algorithm is a subset of those of Lloyd's method. We develop a closed-form expression that allows to establish Hartigan's method for k-means clustering with any Bregman divergence, and further strengthen the case of preferring Hartigan's algorithm over Lloyd's algorithm. Specifically, we characterize a range of problems with various noise levels of the inputs, for which any random partition represents a local minimum for Lloyd's algorithm, while Hartigan's algorithm easily converges to the correct solution. Extensive experiments on synthetic and real-world data further support our theoretical analysis.