AITopics | Mordido, Gonçalo

Collaborating Authors

Mordido, Gonçalo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring Quantization for Efficient Pre-Training of Transformer Language Models

Chitsaz, Kamran, Fournier, Quentin, Mordido, Gonçalo, Chandar, Sarath

arXiv.org Artificial IntelligenceJul-16-2024

The increasing scale of Transformer models has led to an increase in their pre-training computational requirements. While quantization has proven to be effective after pre-training and during fine-tuning, applying quantization in Transformers during pre-training has remained largely unexplored at scale for language modeling. This study aims to explore the impact of quantization for efficient pre-training of Transformers, with a focus on linear layer components. By systematically applying straightforward linear quantization to weights, activations, gradients, and optimizer states, we assess its effects on model efficiency, stability, and performance during training. By offering a comprehensive recipe of effective quantization strategies to be applied during the pre-training of Transformers, we promote high training efficiency from scratch while retaining language modeling ability. Code is available at https://github.com/chandar-lab/EfficientLLMs.

machine learning, natural language, quantization, (18 more...)

arXiv.org Artificial Intelligence

2407.11722

Country: North America > Canada > Quebec (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Lookbehind Optimizer: k steps back, 1 step forward

Mordido, Gonçalo, Malviya, Pranshu, Baratin, Aristide, Chandar, Sarath

arXiv.org Artificial IntelligenceJul-31-2023

The Lookahead optimizer improves the training stability of deep neural networks by having a set of fast weights that "look ahead" to guide the descent direction. Here, we combine this idea with sharpness-aware minimization (SAM) to stabilize its multi-step variant and improve the loss-sharpness trade-off. We propose Lookbehind, which computes $k$ gradient ascent steps ("looking behind") at each iteration and combine the gradients to bias the descent step toward flatter minima. We apply Lookbehind on top of two popular sharpness-aware training methods -- SAM and adaptive SAM (ASAM) -- and show that our approach leads to a myriad of benefits across a variety of tasks and training regimes. Particularly, we show increased generalization performance, greater robustness against noisy weights, and higher tolerance to catastrophic forgetting in lifelong learning settings.

artificial intelligence, lookbehind, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2307.16704

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

Malviya, Pranshu, Mordido, Gonçalo, Baratin, Aristide, Harikandeh, Reza Babanezhad, Huang, Jerry, Lacoste-Julien, Simon, Pascanu, Razvan, Chandar, Sarath

arXiv.org Artificial IntelligenceJul-18-2023

Adaptive gradient-based optimizers, particularly Adam, have left their mark in training large-scale deep learning models. The strength of such optimizers is that they exhibit fast convergence while being more robust to hyperparameter choice. However, they often generalize worse than non-adaptive methods. Recent studies have tied this performance gap to flat minima selection: adaptive methods tend to find solutions in sharper basins of the loss landscape, which in turn hurts generalization. To overcome this issue, we propose a new memory-augmented version of Adam that promotes exploration towards flatter minima by using a buffer of critical momentum terms during training. Intuitively, the use of the buffer makes the optimizer overshoot outside the basin of attraction if it is not wide enough. We empirically show that our method improves the performance of several variants of Adam on standard supervised language modelling and image classification tasks.

artificial intelligence, machine learning, optimizer, (16 more...)

arXiv.org Artificial Intelligence

2307.09638

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SAMSON: Sharpness-Aware Minimization Scaled by Outlier Normalization for Improving DNN Generalization and Robustness

Mordido, Gonçalo, Henwood, Sébastien, Chandar, Sarath, Leduc-Primeau, François

arXiv.org Artificial IntelligenceMar-21-2023

Energy-efficient deep neural network (DNN) accelerators are prone to non-idealities that degrade DNN performance at inference time. To mitigate such degradation, existing methods typically add perturbations to the DNN weights during training to simulate inference on noisy hardware. However, this often requires knowledge about the target hardware and leads to a trade-off between DNN performance and robustness, decreasing the former to increase the latter. In this work, we show that applying sharpness-aware training, by optimizing for both the loss value and loss sharpness, significantly improves robustness to noisy hardware at inference time without relying on any assumptions about the target hardware. In particular, we propose a new adaptive sharpness-aware method that conditions the worst-case perturbation of a given weight not only on its magnitude but also on the range of the weight distribution. This is achieved by performing sharpness-aware minimization scaled by outlier minimization (SAMSON). Our approach outperforms existing sharpness-aware training methods both in terms of model generalization performance in noiseless regimes and robustness in noisy settings, as measured on several architectures and datasets.

artificial intelligence, machine learning, sharpness-aware minimization scaled, (3 more...)

arXiv.org Artificial Intelligence

2211.11561

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

Evaluating Post-Training Compression in GANs using Locality-Sensitive Hashing

Mordido, Gonçalo, Yang, Haojin, Meinel, Christoph

arXiv.org Artificial IntelligenceMar-22-2021

The analysis of the compression effects in generative adversarial networks (GANs) after training, i.e. without any fine-tuning, remains an unstudied, albeit important, topic with the increasing trend of their computation and memory requirements. While existing works discuss the difficulty of compressing GANs during training, requiring novel methods designed with the instability of GANs training in mind, we show that existing compression methods (namely clipping and quantization) may be directly applied to compress GANs post-training, without any additional changes. High compression levels may distort the generated set, likely leading to an increase of outliers that may negatively affect the overall assessment of existing k-nearest neighbor (KNN) based metrics. We propose two new precision and recall metrics based on locality-sensitive hashing (LSH), which, on top of increasing the outlier robustness, decrease the complexity of assessing an evaluation sample against $n$ reference samples from $O(n)$ to $O(\log(n))$, if using LSH and KNN, and to $O(1)$, if only applying LSH. We show that low-bit compression of several pre-trained GANs on multiple datasets induces a trade-off between precision and recall, retaining sample quality while sacrificing sample diversity.

artificial intelligence, bit 32, neural network, (17 more...)

arXiv.org Artificial Intelligence

2103.11912

Country:

Europe > Germany (0.14)
Europe > Spain (0.14)
Europe > France (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Mark-Evaluate: Assessing Language Generation using Population Estimation Methods

Mordido, Gonçalo, Meinel, Christoph

arXiv.org Artificial IntelligenceOct-9-2020

We propose a family of metrics to assess language generation derived from population estimation methods widely used in ecology. More specifically, we use mark-recapture and maximum-likelihood methods that have been applied over the past several decades to estimate the size of closed populations in the wild. We propose three novel metrics: ME$_\text{Petersen}$ and ME$_\text{CAPTURE}$, which retrieve a single-valued assessment, and ME$_\text{Schnabel}$ which returns a double-valued metric to assess the evaluation set in terms of quality and diversity, separately. In synthetic experiments, our family of methods is sensitive to drops in quality and diversity. Moreover, our methods show a higher correlation to human evaluation than existing metrics on several challenging tasks, namely unconditional language generation, machine translation, and text summarization.

artificial intelligence, evaluation, machine translation, (18 more...)

arXiv.org Artificial Intelligence

2010.04606

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.92)

Add feedback

Instant Quantization of Neural Networks using Monte Carlo Methods

Mordido, Gonçalo, Van Keirsbilck, Matthijs, Keller, Alexander

arXiv.org Machine LearningMay-29-2019

Low bit-width integer weights and activations are very important for efficient inference, especially with respect to lower power consumption. We propose Monte Carlo methods to quantize the weights and activations of pre-trained neural networks without any re-training. By performing importance sampling we obtain quantized low bit-width integer values from full-precision weights and activations. The precision, sparsity, and complexity are easily configurable by the amount of sampling performed. Our approach, called Monte Carlo Quantization (MCQ), is linear in both time and space, with the resulting quantized, sparse networks showing minimal accuracy loss when compared to the original full-precision networks. Our method either outperforms or achieves competitive results on multiple benchmarks compared to previous quantization methods that do require additional training.

activation, deep learning, neural network, (16 more...)

arXiv.org Machine Learning

1905.12253

Country:

Europe > Germany (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dropout-GAN: Learning from a Dynamic Ensemble of Discriminators

Mordido, Gonçalo, Yang, Haojin, Meinel, Christoph

arXiv.org Machine LearningJul-30-2018

We propose to incorporate adversarial dropout in generative multi-adversarial networks, by omitting or dropping out, the feedback of each discriminator in the framework with some probability at the end of each batch. Our approach forces the single generator not to constrain its output to satisfy a single discriminator, but, instead, to satisfy a dynamic ensemble of discriminators. We show that this leads to a more generalized generator, promoting variety in the generated samples and avoiding the common mode collapse problem commonly experienced with generative adversarial networks (GANs). We further provide evidence that the proposed framework, named Dropout-GAN, promotes sample diversity both within and across epochs, eliminating mode collapse and stabilizing training.

artificial intelligence, discriminator, neural network, (18 more...)

arXiv.org Machine Learning

1807.11346

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback