AITopics | Nachum, Ido

Collaborating Authors

Nachum, Ido

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Batch Normalization Decomposed

Nachum, Ido, Bondaschi, Marco, Gastpar, Michael, Khina, Anatoly

arXiv.org Artificial IntelligenceDec-3-2024

Batch normalization is a successful building block of neural network architectures. Yet, it is not well understood. A neural network layer with batch normalization comprises three components that affect the representation induced by the network: recentering the mean of the representation to zero, rescaling the variance of the representation to one, and finally applying a non-linearity. Our work follows the work of Hadi Daneshmand, Amir Joudaki, Francis Bach [NeurIPS '21], which studied deep linear neural networks with only the rescaling stage between layers at initialization. In our work, we present an analysis of the other two key components of networks with batch normalization, namely, the recentering and the non-linearity. When these two components are present, we observe a curious behavior at initialization. Through the layers, the representation of the batch converges to a single cluster except for an odd data point that breaks far away from the cluster in an orthogonal direction. We shed light on this behavior from two perspectives: (1) we analyze the geometrical evolution of a simplified indicative model; (2) we prove a stability result for the aforementioned configuration.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2412.02843

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Which Algorithms Have Tight Generalization Bounds?

Gastpar, Michael, Nachum, Ido, Shafer, Jonathan, Weinberger, Thomas

arXiv.org Machine LearningOct-2-2024

We study which machine learning algorithms have tight generalization bounds. First, we present conditions that preclude the existence of tight generalization bounds. Specifically, we show that algorithms that have certain inductive biases that cause them to be unstable do not admit tight generalization bounds. Next, we show that algorithms that are sufficiently stable do have tight generalization bounds. We conclude with a simple characterization that relates the existence of tight generalization bounds to the conditional variance of the algorithm's loss.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

2410.01969

Country:

Asia > Middle East (0.14)
Europe > United Kingdom > England (0.14)
Europe > Spain (0.14)
Africa > Ethiopia (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Fantastic Generalization Measures are Nowhere to be Found

Gastpar, Michael, Nachum, Ido, Shafer, Jonathan, Weinberger, Thomas

arXiv.org Machine LearningNov-28-2023

We study the notion of a generalization bound being uniformly tight, meaning that the difference between the bound and the population loss is small for all learning algorithms and all population distributions. Numerous generalization bounds have been proposed in the literature as potential explanations for the ability of neural networks to generalize in the overparameterized setting. However, in their paper ``Fantastic Generalization Measures and Where to Find Them,'' Jiang et al. (2020) examine more than a dozen generalization bounds, and show empirically that none of them are uniformly tight. This raises the question of whether uniformly-tight generalization bounds are at all possible in the overparameterized setting. We consider two types of generalization bounds: (1) bounds that may depend on the training set and the learned hypothesis (e.g., margin bounds). We prove mathematically that no such bound can be uniformly tight in the overparameterized setting; (2) bounds that may in addition also depend on the learning algorithm (e.g., stability bounds). For these bounds, we show a trade-off between the algorithm's performance and the bound's tightness. Namely, if the algorithm achieves good accuracy on certain distributions, then no generalization bound can be uniformly tight for it in the overparameterized setting. We explain how these formal results can, in our view, inform research on generalization bounds for neural networks, while stressing that other interpretations of these results are also possible.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2309.13658

Country:

Europe (0.46)
North America > United States > New York (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On Symmetry and Initialization for Neural Networks

Nachum, Ido, Yehudayoff, Amir

arXiv.org Machine LearningJul-1-2019

This work provides an additional step in the theoretical understanding of neural networks. We consider neural networks with one hidden layer and show that when learning symmetric functions, one can choose initial conditions so that standard SGD training efficiently produces generalization guarantees. We empirically verify this and show that this does not hold when the initial conditions are chosen at random. The proof of convergence investigates the interaction between the two layers of the network. Our results highlight the importance of using symmetry in the design of neural networks.

deep learning, initialization, neural network, (15 more...)

arXiv.org Machine Learning

1907.0056

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

On the Perceptron's Compression

Moran, Shay, Nachum, Ido, Panasoff, Itai, Yehudayoff, Amir

arXiv.org Machine LearningJun-14-2018

We study and provide exposition to several phenomena that are related to the perceptron's compression. One theme concerns modifications of the perceptron algorithm that yield better guarantees on the margin of the hyperplane it outputs. These modifications can be useful in training neural networks as well, and we demonstrate them with some experimental data. In a second theme, we deduce conclusions from the perceptron's compression in various contexts.

artificial intelligence, neural network, perceptron, (17 more...)

arXiv.org Machine Learning

1806.05403

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback

A Direct Sum Result for the Information Complexity of Learning

Nachum, Ido, Shafer, Jonathan, Yehudayoff, Amir

arXiv.org Machine LearningApr-15-2018

How many bits of information are required to PAC learn a class of hypotheses of VC dimension $d$? The mathematical setting we follow is that of Bassily et al. (2018), where the value of interest is the mutual information $\mathrm{I}(S;A(S))$ between the input sample $S$ and the hypothesis outputted by the learning algorithm $A$. We introduce a class of functions of VC dimension $d$ over the domain $\mathcal{X}$ with information complexity at least $\Omega\left(d\log \log \frac{|\mathcal{X}|}{d}\right)$ bits for any consistent and proper algorithm (deterministic or random). Bassily et al. proved a similar (but quantitatively weaker) result for the case $d=1$. The above result is in fact a special case of a more general phenomenon we explore. We define the notion of information complexity of a given class of functions $\mathcal{H}$. Intuitively, it is the minimum amount of information that an algorithm for $\mathcal{H}$ must retain about its input to ensure consistency and properness. We prove a direct sum result for information complexity in this context; roughly speaking, the information complexity sums when combining several classes.

artificial intelligence, information complexity, machine learning, (13 more...)

arXiv.org Machine Learning

1804.05474

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Learners that Leak Little Information

Bassily, Raef, Moran, Shay, Nachum, Ido, Shafer, Jonathan, Yehudayoff, Amir

arXiv.org Artificial IntelligenceDec-24-2017

We study learning algorithms that are restricted to using a small amount of information from their input sample. We introduce a category of learning algorithms we term d-bit information learners, which are algorithms whose output conveys at most d bits of information on their input. A central theme in this work is that such algorithms generalize. We focus on the learning capacity of these algorithms, and prove sample complexity bounds with tight dependencies on the confidence and error parameters. We also observe connections with well studied notions such as sample compression schemes, Occam's razor, PAC-Bayes and differential privacy. We discuss an approach that allows us to prove upper bounds on the amount of information that algorithms reveal about their inputs, and also provide a lower bound by showing a simple concept class for which every (possibly randomized) empirical risk minimizer must reveal a lot of information. On the other hand, we show that in the distribution-dependent setting every VC class has empirical risk minimizers that do not reveal a lot of information.

artificial intelligence, information, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1710.05233

Country:

Asia > Middle East > Israel (0.14)
North America > United States > Ohio (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback