HadamRNN: Binary and Sparse Ternary Orthogonal RNNs
Foucault, Armand, Mamalet, Franck, Malgouyres, François
–arXiv.org Artificial Intelligence
Binary and sparse ternary weights in neural networks enable faster computations and lighter representations, facilitating their use on edge devices with limited computational power. Meanwhile, vanilla RNNs are highly sensitive to changes in their recurrent weights, making the binarization and ternarization of these weights inherently challenging. To date, no method has successfully achieved binarization or ternarization of vanilla RNN weights. We present a new approach leveraging the properties of Hadamard matrices to parameterize a subset of binary and sparse ternary orthogonal matrices. This method enables the training of orthogonal RNNs (ORNNs) with binary and sparse ternary recurrent weights, effectively creating a specific class of binary and sparse ternary vanilla RNNs. The resulting ORNNs, called HadamRNN and Block-HadamRNN, are evaluated on benchmarks such as the copy task, permuted and sequential MNIST tasks, and IMDB dataset. Despite binarization or sparse ternarization, these RNNs maintain performance levels comparable to state-of-the-art full-precision models, highlighting the effectiveness of our approach. Notably, our approach is the first solution with binary recurrent weights capable of tackling the copy task over 1000 timesteps. A Recurrent Neural Network (RNN) is a neural network architecture relying on a recurrent computation mechanism at its core. These networks are well-suited for the processing of time series, thanks to their ability to model temporal dependence within data sequences. Modern RNN architectures typically rely on millions, or even billions, of parameters to perform optimally. This necessitates substantial storage spaces and costly matrix-vector products at inferencetime, that may result in computational delays. These features can be prohibitive when applications must operate in real-time or on edge devices with limited computational resources. A compelling strategy to alleviate this problem is to replace the full-precision weights within the network with weights having a low-bit representation. This strategy known as neural network quantization (Courbariaux et al., 2015; Lin et al., 2015; Courbariaux et al., 2016; Hubara et al., 2017; Zhou et al., 2016) has been extensively studied over the recent years.
arXiv.org Artificial Intelligence
Feb-5-2025