AITopics | Seroussi, Inbar

Collaborating Authors

Seroussi, Inbar

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Applications of Statistical Field Theory in Deep Learning

Ringel, Zohar, Rubin, Noa, Mor, Edo, Helias, Moritz, Seroussi, Inbar

arXiv.org Machine LearningFeb-28-2025

Deep learning algorithms have made incredible strides in the past decade yet due to the complexity of these algorithms, the science of deep learning remains in its early stages. Being an experimentally driven field, it is natural to seek a theory of deep learning within the physics paradigm. As deep learning is largely about learning functions and distributions over functions, statistical field theory, a rich and versatile toolbox for tackling complex distributions over functions (fields) is an obvious choice of formalism. Research efforts carried out in the past few years have demonstrated the ability of field theory to provide useful insights on generalization, implicit bias, and feature learning effects. Here we provide a pedagogical review of this emerging line of research.

approximation, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

2502.18553

Country:

North America > United States (1.00)
Europe (0.92)
Asia > Middle East > Israel (0.27)

Genre:

Research Report (0.64)
Instructional Material (0.45)

Industry:

Energy (0.46)
Education (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning

Rubin, Noa, Fischer, Kirsten, Lindner, Javed, Dahmen, David, Seroussi, Inbar, Ringel, Zohar, Krämer, Michael, Helias, Moritz

arXiv.org Machine LearningFeb-5-2025

Theoretically describing feature learning in neural networks is crucial for understanding their expressive power and inductive biases, motivating various approaches. Some approaches describe network behavior after training through a simple change in kernel scale from initialization, resulting in a generalization power comparable to a Gaussian process. Conversely, in other approaches training results in the adaptation of the kernel to the data, involving complex directional changes to the kernel. While these approaches capture different facets of network behavior, their relationship and respective strengths across scaling regimes remains an open question. This work presents a theoretical framework of multi-scale adaptive feature learning bridging these approaches. Using methods from statistical mechanics, we derive analytical expressions for network output statistics which are valid across scaling regimes and in the continuum between them. A systematic expansion of the network's probability distribution reveals that mean-field scaling requires only a saddle-point approximation, while standard scaling necessitates additional correction terms. Remarkably, we find across regimes that kernel adaptation can be reduced to an effective kernel rescaling when predicting the mean network output of a linear network. However, even in this case, the multi-scale adaptive approach captures directional feature learning effects, providing richer insights than what could be recovered from a rescaling of the kernel alone.

artificial intelligence, machine learning, multi-scale adaptive theory, (16 more...)

arXiv.org Machine Learning

2502.0321

Country:

Asia > Middle East > Israel (0.28)
North America > United States > Texas > Clay County (0.25)

Genre: Research Report (0.65)

Industry:

Information Technology > Networks (0.68)
Telecommunications > Networks (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

Collins-Woodfin, Elizabeth, Seroussi, Inbar, Malaxechebarría, Begoña García, Mackenzie, Andrew W., Paquette, Elliot, Paquette, Courtney

arXiv.org Machine LearningMay-29-2024

In deterministic optimization, adaptive stepsize strategies, such as line search (see [39], therein), AdaGrad-Norm [55], Polyak stepsize [46], and others were developed to provide stability and improve efficiency and adaptivity to unknown parameters. While the practical benefits for deterministic optimization problems are well-documented, much of our understanding of adaptive learning rate strategies for stochastic algorithms are still in their infancy. There are many adaptive learning rate strategies used in machine learning with many design goals. Some are known to adapt to SGD gradient noise while others are robust to hyper-parameters (e.g., [4, 59]). Theoretical results for adaptive algorithms tend to focus on guaranteeing minimax-optimal rates, but this theory is not engineered to provide realistic performance comparisons; indeed many adaptive algorithms are minimax-optimal, and so more precise statements are needed to distinguish them. For instance, the exact learning rates (or rate schedules) to which these strategies converge are unknown, nor their dependence on the geometry of the problem. Moreover, we often do not know how these adaptive stepsizes compare with well-tuned constant or decaying fixed learning rate stochastic gradient descent (SGD), which can be viewed as a cost associated with selecting the adaptive strategy in comparison to tuning by hand.

adagrad-norm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

2405.19585

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

Optimal minimax rate of learning interaction kernels

Wang, Xiong, Seroussi, Inbar, Lu, Fei

arXiv.org Machine LearningNov-28-2023

Nonparametric estimation of nonlocal interaction kernels is crucial in various applications involving interacting particle systems. The inference challenge, situated at the nexus of statistical learning and inverse problems, comes from the nonlocal dependency. A central question is whether the optimal minimax rate of convergence for this problem aligns with the rate of $M^{-\frac{2\beta}{2\beta+1}}$ in classical nonparametric regression, where $M$ is the sample size and $\beta$ represents the smoothness exponent of the radial kernel. Our study confirms this alignment for systems with a finite number of particles. We introduce a tamed least squares estimator (tLSE) that attains the optimal convergence rate for a broad class of exchangeable distributions. The tLSE bridges the smallest eigenvalue of random matrices and Sobolev embedding. This estimator relies on nonasymptotic estimates for the left tail probability of the smallest eigenvalue of the normal matrix. The lower minimax rate is derived using the Fano-Tsybakov hypothesis testing method. Our findings reveal that provided the inverse problem in the large sample limit satisfies a coercivity condition, the left tail probability does not alter the bias-variance tradeoff, and the optimal minimax rate remains intact. Our tLSE method offers a straightforward approach for establishing the optimal minimax rate for models with either local or nonlocal dependency.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2311.16852

Country:

North America > United States > New York (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Droplets of Good Representations: Grokking as a First Order Phase Transition in Two Layer Networks

Rubin, Noa, Seroussi, Inbar, Ringel, Zohar

arXiv.org Machine LearningNov-22-2023

A key property of deep neural networks (DNNs) is their ability to learn new features during training. This intriguing aspect of deep learning stands out most clearly in recently reported Grokking phenomena. While mainly reflected as a sudden increase in test accuracy, Grokking is also believed to be a beyond lazy-learning/Gaussian Process (GP) phenomenon involving feature learning. Here we apply a recent development in the theory of feature learning, the adaptive kernel approach, to two teacher-student models with cubic-polynomial and modular addition teachers. We provide analytical predictions on feature learning and Grokking properties of these models and demonstrate a mapping between Grokking and the theory of phase transitions. We show that after Grokking, the state of the DNN is analogous to the mixed phase following a first-order phase transition. In this mixed phase, the DNN generates useful internal representations of the teacher that are sharply distinct from those before the transition.

artificial intelligence, machine learning, transition, (17 more...)

arXiv.org Machine Learning

2310.03789

Country: Asia > Middle East > Israel (0.46)

Genre: Research Report (0.50)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Spectral-Bias and Kernel-Task Alignment in Physically Informed Neural Networks

Seroussi, Inbar, Miron, Asaf, Ringel, Zohar

arXiv.org Machine LearningOct-5-2023

Physically informed neural networks (PINNs) are a promising emerging method for solving differential equations. As in many other deep learning approaches, the choice of PINN design and training protocol requires careful craftsmanship. Here, we suggest a comprehensive theoretical framework that sheds light on this important problem. Leveraging an equivalence between infinitely over-parameterized neural networks and Gaussian process regression (GPR), we derive an integro-differential equation that governs PINN prediction in the large data-set limit -- the neurally-informed equation. This equation augments the original one by a kernel term reflecting architecture choices and allows quantifying implicit bias induced by the network via a spectral decomposition of the source term in the original differential equation.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

2307.06362

Country: Asia > Middle East > Israel (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models

Collins-Woodfin, Elizabeth, Paquette, Courtney, Paquette, Elliot, Seroussi, Inbar

arXiv.org Artificial IntelligenceAug-17-2023

We analyze the dynamics of streaming stochastic gradient descent (SGD) in the high-dimensional limit when applied to generalized linear models and multi-index models (e.g. logistic regression, phase retrieval) with general data-covariance. In particular, we demonstrate a deterministic equivalent of SGD in the form of a system of ordinary differential equations that describes a wide class of statistics, such as the risk and other measures of sub-optimality. This equivalence holds with overwhelming probability when the model parameter count grows proportionally to the number of data. This framework allows us to obtain learning rate thresholds for stability of SGD as well as convergence guarantees. In addition to the deterministic equivalent, we introduce an SDE with a simplified diffusion coefficient (homogenized SGD) which allows us to analyze the dynamics of general statistics of SGD iterates. Finally, we illustrate this theory on some standard examples and show numerical simulations which give an excellent match to the theory.

artificial intelligence, machine learning, sgd, (17 more...)

arXiv.org Artificial Intelligence

2308.08977

Country: North America > Canada > Quebec > Montreal (0.13)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Speed Limits for Deep Learning

Seroussi, Inbar, Alemi, Alexander A., Helias, Moritz, Ringel, Zohar

arXiv.org Artificial IntelligenceJul-27-2023

State-of-the-art neural networks require extreme computational power to train. It is therefore natural to wonder whether they are optimally trained. Here we apply a recent advancement in stochastic thermodynamics which allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network, based on the ratio of their Wasserstein-2 distance and the entropy production rate of the dynamical process connecting them. Considering both gradient-flow and Langevin training dynamics, we provide analytical expressions for these speed limits for linear and linearizable neural networks e.g. Neural Tangent Kernel (NTK). Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense. Our results are consistent with small-scale experiments with Convolutional Neural Networks (CNNs) and Fully Connected Neural networks (FCNs) on CIFAR-10, showing a short highly non-optimal regime followed by a longer optimal regime.

artificial intelligence, machine learning, speed limit, (20 more...)

arXiv.org Artificial Intelligence

2307.14653

Country:

Europe (0.28)
Asia > Middle East > Israel (0.28)

Genre: Research Report (0.70)

Industry:

Transportation (0.63)
Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Separation of scales and a thermodynamic description of feature learning in some CNNs

Seroussi, Inbar, Ringel, Zohar

arXiv.org Machine LearningDec-31-2021

Deep neural networks (DNNs) are powerful tools for compressing and distilling information. Due to their scale and complexity, often involving billions of inter-dependent internal degrees of freedom, exact analysis approaches often fall short. A common strategy in such cases is to identify slow degrees of freedom that average out the erratic behavior of the underlying fast microscopic variables. Here, we identify such a separation of scales occurring in over-parameterized deep convolutional neural networks (CNNs) at the end of training. It implies that neuron pre-activations fluctuate in a nearly Gaussian manner with a deterministic latent kernel. While for CNNs with infinitely many channels these kernels are inert, for finite CNNs they adapt and learn from data in an analytically tractable manner. The resulting thermodynamic theory of deep learning yields accurate predictions on several deep non-linear CNN toy models. In addition, it provides new ways of analyzing and understanding CNNs.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Machine Learning

2112.15383

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lower Bounds on the Generalization Error of Nonlinear Learning Models

Seroussi, Inbar, Zeitouni, Ofer

arXiv.org Machine LearningApr-10-2021

We study in this paper lower bounds for the generalization error of models derived from multi-layer neural networks, in the regime where the size of the layers is commensurate with the number of samples in the training data. We show that unbiased estimators have unacceptable performance for such nonlinear networks in this regime. We derive explicit generalization lower bounds for general biased estimators, in the cases of linear regression and of two-layered networks. In the linear case the bound is asymptotically tight. In the nonlinear case, we provide a comparison of our bounds with an empirical study of the stochastic gradient descent algorithm. The analysis uses elements from the theory of large random matrices.

deep learning, matrix, neural network, (17 more...)

arXiv.org Machine Learning

2103.14723

Country:

Europe (0.28)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback