AITopics | Gamba, Matteo

Collaborating Authors

Gamba, Matteo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Lipschitz Constant of Deep Networks and Double Descent

Gamba, Matteo, Azizpour, Hossein, Björkman, Mårten

arXiv.org Artificial IntelligenceNov-14-2023

A longstanding question towards understanding the remarkable generalization ability of deep networks is characterizing the hypothesis class of models trained in practice, thus isolating properties of the networks' model function that capture generalization (Hanin & Rolnick, 2019; Neyshabur et al., 2015). Chiefly, a central problem is understanding the role played by overparameterization (Arora et al., 2018; Neyshabur et al., 2018; Zhang et al., 2018) - a key design choice of state of the art models - in promoting regularization of the model function. Modern overparameterized networks can achieve good generalization while perfectly interpolating the training set (Nakkiran et al., 2019). This phenomenon is described by the double descent curve of the test error (Belkin et al., 2019; Geiger et al., 2019): as model size increases, the error follows the classical bias-variance trade-off curve (Geman et al., 1992), peaks when a model is large enough to interpolate the training data, and then decreases again as model size grows further.

lipschitz constant, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2301.12309

Country:

North America > United States > New York (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Hyperplane Arrangements of Trained ConvNets Are Biased

Gamba, Matteo, Carlsson, Stefan, Azizpour, Hossein, Björkman, Mårten

arXiv.org Artificial IntelligenceApr-14-2023

In recent years, understanding and interpreting the inner workings of deep networks has drawn considerable attention from the community [7, 15, 16, 13]. One long-standing question is the problem of identifying the inductive bias of state-of-the-art networks and the form of implicit regularization that is performed by the optimizer [22, 31, 2] and possibly by natural data itself [3]. While earlier studies focused on the theoretical expressivity of deep networks and the advantage of deeper representations [20, 25, 26], a recent trend in the literature is the study of the effective capacity of trained networks [31, 32, 9, 10]. In fact, while state-of-the-art deep networks are largely overparametrized, it is hypothesized that the full theoretical capacity of a model might not be realized in practice, due to some form of self-regulation at play during learning. Some recent works have, thus, tried to find statistical bias consistently present in trained state-of-the-art models that is interpretable and correlates well with generalization [14, 24]. In this work, we take a geometrical perspective and look for statistical bias in the weights of trained convolutional networks, in terms of hyperplane arrangements induced by convolutional layers with ReLU activations.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2003.07797

Country:

North America > United States > California (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Deep Double Descent via Smooth Interpolation

Gamba, Matteo, Englesson, Erik, Björkman, Mårten, Azizpour, Hossein

arXiv.org Artificial IntelligenceApr-8-2023

The ability of overparameterized deep networks to interpolate noisy data, while at the same time showing good generalization performance, has been recently characterized in terms of the double descent curve for the test error. Common intuition from polynomial regression suggests that overparameterized networks are able to sharply interpolate noisy data, without considerably deviating from the ground-truth signal, thus preserving generalization ability. At present, a precise characterization of the relationship between interpolation and generalization for deep networks is missing. In this work, we quantify sharpness of fit of the training data interpolated by neural network functions, by studying the loss landscape w.r.t. to the input variable locally to each training point, over volumes around cleanly- and noisily-labelled training samples, as we systematically increase the number of model parameters and training epochs. Our findings show that loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy labels. While small interpolating models sharply fit both clean and noisy data, large interpolating models express a smooth loss landscape, where noisy targets are predicted over large volumes around training data points, in contrast to existing intuition.

artificial intelligence, augmentation, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2209.1008

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback