AITopics | Joudaki, Amir

Plotting

Joudaki, Amir

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations

Joudaki, Amir, Hofmann, Thomas

arXiv.org Machine LearningOct-29-2024

Understanding how neural networks transform input data across layers is fundamental to unraveling their learning and generalization capabilities. Although prior work has used insights from kernel methods to study neural networks, a global analysis of how the similarity between hidden representations evolves across layers remains underexplored. In this paper, we introduce a theoretical framework for the evolution of the kernel sequence, which measures the similarity between the hidden representation for two different inputs. Operating under the mean-field regime, we show that the kernel sequence evolves deterministically via a kernel map, which only depends on the activation function. By expanding activation using Hermite polynomials and using their algebraic properties, we derive an explicit form for kernel map and fully characterize its fixed points. Our analysis reveals that for nonlinear activations, the kernel sequence converges globally to a unique fixed point, which can correspond to orthogonal or similar representations depending on the activation and network architecture. We further extend our results to networks with residual connections and normalization layers, demonstrating similar convergence behaviors. This work provides new insights into the implicit biases of deep neural networks and how architectural choices influence the evolution of representations across layers.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2410.20107

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Learning Genomic Sequence Representations using Graph Neural Networks over De Bruijn Graphs

Kapuśniak, Kacper, Burger, Manuel, Rätsch, Gunnar, Joudaki, Amir

arXiv.org Artificial IntelligenceDec-6-2023

The rapid expansion of genomic sequence data calls for new methods to achieve robust sequence representations. Existing techniques often neglect intricate structural details, emphasizing mainly contextual information. To address this, we developed k-mer embeddings that merge contextual and structural string information by enhancing De Bruijn graphs with structural similarity connections. Subsequently, we crafted a self-supervised method based on Contrastive Learning that employs a heterogeneous Graph Convolutional Network encoder and constructs positive pairs based on node similarities. Our embeddings consistently outperform prior techniques for Edit Distance Approximation and Closest String Retrieval tasks.

bioinformatics, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2312.03865

Country: Europe > Switzerland > Zürich > Zürich (0.15)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

On the impact of activation and normalization in obtaining isometric embeddings at initialization

Joudaki, Amir, Daneshmand, Hadi, Bach, Francis

arXiv.org Machine LearningNov-17-2023

In this paper, we explore the structure of the penultimate Gram matrix in deep neural networks, which contains the pairwise inner products of outputs corresponding to a batch of inputs. In several architectures it has been observed that this Gram matrix becomes degenerate with depth at initialization, which dramatically slows training. Normalization layers, such as batch or layer normalization, play a pivotal role in preventing the rank collapse issue. Despite promising advances, the existing theoretical results do not extend to layer normalization, which is widely used in transformers, and can not quantitatively characterize the role of non-linear activations. To bridge this gap, we prove that layer normalization, in conjunction with activation layers, biases the Gram matrix of a multilayer perceptron towards the identity matrix at an exponential rate with depth at initialization. We quantify this rate using the Hermite expansion of the activation function.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

2305.18399

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion

Meterez, Alexandru, Joudaki, Amir, Orabona, Francesco, Immer, Alexander, Rätsch, Gunnar, Daneshmand, Hadi

arXiv.org Artificial IntelligenceOct-3-2023

Normalization layers are one of the key building blocks for deep neural networks. Several theoretical studies have shown that batch normalization improves the signal propagation, by avoiding the representations from becoming collinear across the layers. However, results on mean-field theory of batch normalization also conclude that this benefit comes at the expense of exploding gradients in depth. Motivated by these two aspects of batch normalization, in this study we pose the following question: "Can a batch-normalized network keep the optimal signal propagation properties, but avoid exploding gradients?" We answer this question in the affirmative by giving a particular construction of an Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded gradients at any depth. Based on Weingarten calculus, we develop a rigorous and non-asymptotic theory for this constructed MLP that gives a precise characterization of forward signal propagation, while proving that gradients remain bounded for linearly independent input samples, which holds in most practical settings. Inspired by our theory, we also design an activation shaping scheme that empirically achieves the same properties for certain non-linear activations.

artificial intelligence, gradient explosion, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2310.02012

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Bridging the Gap between Mean Field and Finite Width in Deep Random Neural Networks with Batch Normalization

Joudaki, Amir, Daneshmand, Hadi, Bach, Francis

arXiv.org Artificial IntelligenceFeb-20-2023

Mean field theory is widely used in the theoretical studies of neural networks. In this paper, we analyze the role of depth in the concentration of mean-field predictions, specifically for deep multilayer perceptron (MLP) with batch normalization (BN) at initialization. By scaling the network width to infinity, it is postulated that the mean-field predictions suffer from layer-wise errors that amplify with depth. We demonstrate that BN stabilizes the distribution of representations that avoids the error propagation of mean-field predictions. This stabilization, which is characterized by a geometric mixing property, allows us to establish concentration bounds for mean field predictions in infinitely-deep neural networks with a finite width.

artificial intelligence, machine learning, null, (17 more...)

arXiv.org Artificial Intelligence

2205.13076

Country:

North America > United States (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Batch Normalization Orthogonalizes Representations in Deep Random Networks

Daneshmand, Hadi, Joudaki, Amir, Bach, Francis

arXiv.org Machine LearningJun-7-2021

This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network. We establish a non-asymptotic characterization of the interplay between depth, width, and the orthogonality of deep representations. More precisely, under a mild assumption, we prove that the deviation of the representations from orthogonality rapidly decays with depth up to a term inversely proportional to the network width. This result has two main implications: 1) Theoretically, as the depth grows, the distribution of the representation -- after the linear layers -- contracts to a Wasserstein-2 ball around an isotropic Gaussian distribution. Furthermore, the radius of this Wasserstein ball shrinks with the width of the network. 2) In practice, the orthogonality of the representations directly influences the performance of stochastic gradient descent (SGD). When representations are initially aligned, we observe SGD wastes many iterations to orthogonalize representations before the classification. Nevertheless, we experimentally show that starting optimization from orthogonal representations is sufficient to accelerate SGD, with no need for BN.

deep learning, neural network, representation, (15 more...)

arXiv.org Machine Learning

2106.0397

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback