Goto

Collaborating Authors

 exponentiation


Learning Modular Exponentiation with Transformers

Africa, David Demitri, Kapoor, Sara M., Sorg, Theo Simon, Mishra, Challenger

arXiv.org Artificial Intelligence

Modular exponentiation is crucial to number theory and cryptography, yet remains largely unexplored from a mechanistic interpretability standpoint. We train a 4-layer encoder-decoder Transformer model to perform this operation and investigate the emergence of numerical reasoning during training. Utilizing principled sampling strategies, PCA-based embedding analysis, and activation patching, we examine how number-theoretic properties are encoded within the model. We find that reciprocal operand training leads to strong performance gains, with sudden generalization across related moduli. These synchronized accuracy surges reflect grokking-like dynamics, suggesting the model internalizes shared arithmetic structure. We also find a subgraph consisting entirely of attention heads in the final layer sufficient to achieve full performance on the task of regular exponentiation. These results suggest that transformer models learn modular arithmetic through specialized computational circuits, paving the way for more interpretable and efficient neural approaches to modular exponentiation.


Learning Mathematical Rules with Large Language Models

Gorceix, Antoine, Chenadec, Bastien Le, Rammal, Ahmad, Vadori, Nelson, Veloso, Manuela

arXiv.org Artificial Intelligence

In this paper, we study the ability of large language models to learn specific mathematical rules such as distributivity or simplifying equations. We present an empirical analysis of their ability to generalize these rules, as well as to reuse them in the context of word problems. For this purpose, we provide a rigorous methodology to build synthetic data incorporating such rules, and perform fine-tuning of large language models on such data. Our experiments show that our model can learn and generalize these rules to some extent, as well as suitably reuse them in the context of word problems.


Spin: An Efficient Secure Computation Framework with GPU Acceleration

Jiang, Wuxuan, Song, Xiangjun, Hong, Shenbai, Zhang, Haijun, Liu, Wenxin, Zhao, Bo, Xu, Wei, Li, Yi

arXiv.org Artificial Intelligence

Accuracy and efficiency remain challenges for multi-party computation (MPC) frameworks. Spin is a GPU-accelerated MPC framework that supports multiple computation parties and a dishonest majority adversarial setup. We propose optimized protocols for non-linear functions that are critical for machine learning, as well as several novel optimizations specific to attention that is the fundamental unit of Transformer models, allowing Spin to perform non-trivial CNNs training and Transformer inference without sacrificing security. At the backend level, Spin leverages GPU, CPU, and RDMA-enabled smart network cards for acceleration. Comprehensive evaluations demonstrate that Spin can be up to $2\times$ faster than the state-of-the-art for deep neural network training. For inference on a Transformer model with 18.9 million parameters, our attention-specific optimizations enable Spin to achieve better efficiency, less communication, and better accuracy.


On algorithmically boosting fixed-point computations

Avramopoulos, Ioannis, Vasiloglou, Nikolaos

arXiv.org Artificial Intelligence

The main topic of this paper are algorithms for computing Nash equilibria. We cast our particular methods as instances of a general algorithmic abstraction, namely, a method we call {\em algorithmic boosting}, which is also relevant to other fixed-point computation problems. Algorithmic boosting is the principle of computing fixed points by taking (long-run) averages of iterated maps and it is a generalization of exponentiation. We first define our method in the setting of nonlinear maps. Secondly, we restrict attention to convergent linear maps (for computing dominant eigenvectors, for example, in the PageRank algorithm) and show that our algorithmic boosting method can set in motion {\em exponential speedups in the convergence rate}. Thirdly, we show that algorithmic boosting can convert a (weak) non-convergent iterator to a (strong) convergent one. We also consider a {\em variational approach} to algorithmic boosting providing tools to convert a non-convergent continuous flow to a convergent one. Then, by embedding the construction of averages in the design of the iterated map, we constructively prove the existence of Nash equilibria (and, therefore, Brouwer fixed points). We then discuss implementations of averaging and exponentiation, an important matter even for the scalar case. We finally discuss a relationship between dominant (PageRank) eigenvectors and Nash equilibria.


On the (In)Security of ElGamal in OpenPGP

Communications of the ACM

Let G be a group and g G a generator. To create a key pair (sk, pk), pick a random integer x, compute the element X gx, and output (sk, pk): (x, X). Given pk, to encrypt a message M, pick an ephemeral random integer y, compute the elements Y gy and Z Xy gxy, and output C (C1, C2): (Y, M · Z) as the ciphertext. Given sk, to decrypt C, first recover element Z from C1 as per Z Yx gyx and then use C2, Z to recover M C2/Z. To instantiate the scheme, the following details have to be fixed: Which group G shall be used?


Secure Quantized Training for Deep Learning

Keller, Marcel, Sun, Ke

arXiv.org Artificial Intelligence

We implement training of neural networks in secure multi-party computation (MPC) using quantization commonly used in said setting. We are the first to present an MNIST classifier purely trained in MPC that comes within 0.2 percent of the accuracy of the same convolutional neural network trained via plaintext computation. More concretely, we have trained a network with two convolutional and two dense layers to 99.2% accuracy in 3.5 hours (under one hour for 99% accuracy). We have also implemented AlexNet for CIFAR-10, which converges in a few hours. We develop novel protocols for exponentiation and inverse square root. Finally, we present experiments in a range of MPC security models for up to ten parties, both with honest and dishonest majority as well as semi-honest and malicious security.


S++: A Fast and Deployable Secure-Computation Framework for Privacy-Preserving Neural Network Training

Ramachandran, Prashanthi, Agarwal, Shivam, Mondal, Arup, Shah, Aastha, Gupta, Debayan

arXiv.org Artificial Intelligence

We introduce S++, a simple, robust, and deployable framework for training a neural network (NN) using private data from multiple sources, using secret-shared secure function evaluation. In short, consider a virtual third party to whom every data-holder sends their inputs, and which computes the neural network: in our case, this virtual third party is actually a set of servers which individually learn nothing, even with a malicious (but non-colluding) adversary. Previous work in this area has been limited to just one specific activation function: ReLU, rendering the approach impractical for many use-cases. For the first time, we provide fast and verifiable protocols for all common activation functions and optimize them for running in a secret-shared manner. The ability to quickly, verifiably, and robustly compute exponentiation, softmax, sigmoid, etc., allows us to use previously written NNs without modification, vastly reducing developer effort and complexity of code. In recent times, ReLU has been found to converge much faster and be more computationally efficient as compared to non-linear functions like sigmoid or tanh. However, we argue that it would be remiss not to extend the mechanism to non-linear functions such as the logistic sigmoid, tanh, and softmax that are fundamental due to their ability to express outputs as probabilities and their universal approximation property. Their contribution in RNNs and a few recent advancements also makes them more relevant.


Distributed Differentially Private Averaging with Improved Utility and Robustness to Malicious Parties

Sabater, César, Bellet, Aurélien, Ramon, Jan

arXiv.org Machine Learning

Learning from data owned by several parties, as in federated learning, raises challenges regarding the privacy guarantees provided to participants and the correctness of the computation in the presence of malicious parties. We tackle these challenges in the context of distributed averaging, an essential building block of distributed and federated learning. Our first contribution is a novel distributed differentially private protocol which naturally scales with the number of parties. The key idea underlying our protocol is to exchange correlated Gaussian noise along the edges of a network graph, complemented by independent noise added by each party. We analyze the differential privacy guarantees of our protocol and the impact of the graph topology, showing that we can match the accuracy of the trusted curator model even when each party communicates with only a logarithmic number of other parties chosen at random. This is in contrast with protocols in the local model of privacy (with lower accuracy) or based on secure aggregation (where all pairs of users need to exchange messages). Our second contribution is to enable users to prove the correctness of their computations without compromising the efficiency and privacy guarantees of the protocol. Our construction relies on standard cryptographic primitives like commitment schemes and zero knowledge proofs.


R Fundamentals: Building a Simple Grade Calculator

#artificialintelligence

R is one of the most popular languages for statistical analysis, data science, and reporting. At Dataquest, we have been adding R courses (you can learn more in our recent update). In this tutorial, we'll teach you the basics of R by building a simple grade calculator. While we do not assume any R-specific knowledge, you should be familiar with general programming concepts. This tutorial is based on part of our newly released introductory R course.


An Analysis of Arithmetic Constraints on Integer Intervals

Apt, Krzysztof R., Zoeteweij, Peter

arXiv.org Artificial Intelligence

Arithmetic constraints on integer intervals are supported in many constraint programming systems. We study here a number of approaches to implement constraint propagation for these constraints. To describe them we introduce integer interval arithmetic. Each approach is explained using appropriate proof rules that reduce the variable domains. We compare these approaches using a set of benchmarks. For the most promising approach we provide results that characterize the effect of constraint propagation. This is a full version of our earlier paper, cs.PL/0403016.