AITopics | Foucault, Armand

Collaborating Authors

Foucault, Armand

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HadamRNN: Binary and Sparse Ternary Orthogonal RNNs

Foucault, Armand, Mamalet, Franck, Malgouyres, François

arXiv.org Artificial IntelligenceFeb-5-2025

Binary and sparse ternary weights in neural networks enable faster computations and lighter representations, facilitating their use on edge devices with limited computational power. Meanwhile, vanilla RNNs are highly sensitive to changes in their recurrent weights, making the binarization and ternarization of these weights inherently challenging. To date, no method has successfully achieved binarization or ternarization of vanilla RNN weights. We present a new approach leveraging the properties of Hadamard matrices to parameterize a subset of binary and sparse ternary orthogonal matrices. This method enables the training of orthogonal RNNs (ORNNs) with binary and sparse ternary recurrent weights, effectively creating a specific class of binary and sparse ternary vanilla RNNs. The resulting ORNNs, called HadamRNN and Block-HadamRNN, are evaluated on benchmarks such as the copy task, permuted and sequential MNIST tasks, and IMDB dataset. Despite binarization or sparse ternarization, these RNNs maintain performance levels comparable to state-of-the-art full-precision models, highlighting the effectiveness of our approach. Notably, our approach is the first solution with binary recurrent weights capable of tackling the copy task over 1000 timesteps. A Recurrent Neural Network (RNN) is a neural network architecture relying on a recurrent computation mechanism at its core. These networks are well-suited for the processing of time series, thanks to their ability to model temporal dependence within data sequences. Modern RNN architectures typically rely on millions, or even billions, of parameters to perform optimally. This necessitates substantial storage spaces and costly matrix-vector products at inferencetime, that may result in computational delays. These features can be prohibitive when applications must operate in real-time or on edge devices with limited computational resources. A compelling strategy to alleviate this problem is to replace the full-precision weights within the network with weights having a low-bit representation. This strategy known as neural network quantization (Courbariaux et al., 2015; Lin et al., 2015; Courbariaux et al., 2016; Hubara et al., 2017; Zhou et al., 2016) has been extensively studied over the recent years.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Artificial Intelligence

2502.00047

Country: Europe > France > Occitanie (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Quantized Approximately Orthogonal Recurrent Neural Networks

Foucault, Armand, Mamalet, Franck, Malgouyres, François

arXiv.org Artificial IntelligenceFeb-5-2024

Orthogonal recurrent neural networks (ORNNs) are an appealing option for learning tasks involving time series with long-term dependencies, thanks to their simplicity and computational stability. However, these networks often require a substantial number of parameters to perform well, which can be prohibitive in power-constrained environments, such as compact devices. One approach to address this issue is neural network quantization. The construction of such networks remains an open problem, acknowledged for its inherent instability.In this paper, we explore the quantization of the recurrent and input weight matrices in ORNNs, leading to Quantized approximately Orthogonal RNNs (QORNNs). We investigate one post-training quantization (PTQ) strategy and three quantization-aware training (QAT) algorithms that incorporate orthogonal constraints and quantized weights. Empirical results demonstrate the advantages of employing QAT over PTQ. The most efficient model achieves results similar to state-of-the-art full-precision ORNN and LSTM on a variety of standard benchmarks, even with 3-bits quantization.

artificial intelligence, machine learning, survey article, (13 more...)

arXiv.org Artificial Intelligence

2402.04012

Country: Europe > France (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A general approximation lower bound in $L^p$ norm, with applications to feed-forward neural networks

Achour, El Mehdi, Foucault, Armand, Gerchinovitz, Sébastien, Malgouyres, François

arXiv.org Artificial IntelligenceDec-20-2022

We study the fundamental limits to the expressive power of neural networks. Given two sets $F$, $G$ of real-valued functions, we first prove a general lower bound on how well functions in $F$ can be approximated in $L^p(\mu)$ norm by functions in $G$, for any $p \geq 1$ and any probability measure $\mu$. The lower bound depends on the packing number of $F$, the range of $F$, and the fat-shattering dimension of $G$. We then instantiate this bound to the case where $G$ corresponds to a piecewise-polynomial feed-forward neural network, and describe in details the application to two sets $F$: H{\"o}lder balls and multivariate monotonic functions. Beside matching (known or new) upper bounds up to log factors, our lower bounds shed some light on the similarities or differences between approximation in $L^p$ norm or in sup norm, solving an open question by DeVore et al. (2021). Our proof strategy differs from the sup norm case and uses a key probability result of Mendelson (2002).

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2206.0436

Country:

Europe (0.46)
North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback