AITopics | Lenka Zdeborová

Gradient-based algorithms are effective for many machine learning tasks, but despite ample recent effort and some progress, it often remains unclear why they work in practice in optimising high-dimensional non-convex functions and why they find good minima instead of being trapped in spurious ones. Here we present a quantitative theory explaining this behaviour in a spiked matrix-tensor model. Our framework is based on the Kac-Rice analysis of stationary points and a closed-form analysis of gradient-flow originating from statistical physics. We show that there is a well defined region of parameters where the gradient-flow algorithm finds a good global minimum despite the presence of exponentially many spurious local minima. We show that this is achieved by surfing on saddles that have strong negative direction towards the global minima, a phenomenon that is connected to a BBP-type threshold in the Hessian describing the critical points of the landscapes.

artificial intelligence, machine learning, minima, (16 more...)

Neural Information Processing Systems

Country:

North America (0.46)
Europe > France (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

The committee machine: Computational to statistical gaps in learning a two-layers neural network

Benjamin Aubin, Antoine Maillard, jean barbier, Florent Krzakala, Nicolas Macris, Lenka Zdeborová

Neural Information Processing SystemsMay-26-2025, 07:57:30 GMT

Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters. We find that there are regimes in which a low generalization error is information-theoretically achievable while the AMP algorithm fails to deliver it; strongly suggesting that no efficient algorithm exists for those cases, and unveiling a large computational gap. While the traditional approach to learning and generalization follows the Vapnik-Chervonenkis [1] and Rademacher [2] worst-case type bounds, there has been a considerable body of theoretical work on calculating the generalization ability of neural networks for data arising from a probabilistic model within the framework of statistical mechanics [3, 4, 5, 6, 7]. In the wake of the need to understand the effectiveness of neural networks and also the limitations of the classical approaches [8], it is of interest to revisit the results that have emerged thanks to the physics perspective. This direction is currently experiencing a strong revival, see e.g.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > France (0.14)
North America > Canada (0.14)
(2 more...)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Entropy and mutual information in models of deep neural networks

Marylou Gabrié, Andre Manoel, Clément Luneau, jean barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová

Neural Information Processing SystemsMay-26-2025, 07:14:01 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, mutual information, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Who is Afraid of Big Bad Minima? Analysis of gradient-flow in spiked matrix-tensor models

Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Lenka Zdeborová

Neural Information Processing SystemsMar-27-2025, 05:28:02 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, europe government, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America (0.46)
Europe > France (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup

Sebastian Goldt, Madhu Advani, Andrew M. Saxe, Florent Krzakala, Lenka Zdeborová

Neural Information Processing SystemsMar-26-2025, 21:46:20 GMT

Deep neural networks achieve stellar generalisation even when they have enough parameters to easily fit all their training data. We study this phenomenon by analysing the dynamics and the performance of over-parameterised two-layer neural networks in the teacher-student setup, where one network, the student, is trained on data generated by another network, called the teacher. We show how the dynamics of stochastic gradient descent (SGD) is captured by a set of differential equations and prove that this description is asymptotically exact in the limit of large inputs. Using this framework, we calculate the final generalisation error of student networks that have more parameters than their teachers. We find that the final generalisation error of the student increases with network size when training only the first layer, but stays constant or even decreases with size when training both layers. We show that these different behaviours have their root in the different solutions SGD finds for different activation functions. Our results indicate that achieving good generalisation in neural networks goes beyond the properties of SGD alone and depends on the interplay of at least the algorithm, the model architecture, and the data set. Deep neural networks behind state-of-the-art results in image classification and other domains have one thing in common: their size.

artificial intelligence, machine learning, neural network, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The committee machine: Computational to statistical gaps in learning a two-layers neural network

Benjamin Aubin, Antoine Maillard, jean barbier, Florent Krzakala, Nicolas Macris, Lenka Zdeborová

Neural Information Processing SystemsMar-26-2025, 11:20:31 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, europe government, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (1.00)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Entropy and mutual information in models of deep neural networks

Marylou Gabrié, Andre Manoel, Clément Luneau, jean barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová

Neural Information Processing SystemsMar-26-2025, 06:40:24 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, mutual information, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.93)
North America > United States (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The spiked matrix model with generative priors

Benjamin Aubin, Bruno Loureiro, Antoine Maillard, Florent Krzakala, Lenka Zdeborová

Neural Information Processing SystemsMar-23-2025, 01:20:28 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America (0.46)
Europe (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup

Sebastian Goldt, Madhu Advani, Andrew M. Saxe, Florent Krzakala, Lenka Zdeborová

Neural Information Processing SystemsJan-27-2025, 01:29:55 GMT

Deep neural networks achieve stellar generalisation even when they have enough parameters to easily fit all their training data. We study this phenomenon by analysing the dynamics and the performance of over-parameterised two-layer neural networks in the teacher-student setup, where one network, the student, is trained on data generated by another network, called the teacher. We show how the dynamics of stochastic gradient descent (SGD) is captured by a set of differential equations and prove that this description is asymptotically exact in the limit of large inputs. Using this framework, we calculate the final generalisation error of student networks that have more parameters than their teachers. We find that the final generalisation error of the student increases with network size when training only the first layer, but stays constant or even decreases with size when training both layers. We show that these different behaviours have their root in the different solutions SGD finds for different activation functions. Our results indicate that achieving good generalisation in neural networks goes beyond the properties of SGD alone and depends on the interplay of at least the algorithm, the model architecture, and the data set. Deep neural networks behind state-of-the-art results in image classification and other domains have one thing in common: their size.

artificial intelligence, machine learning, neural network, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The spiked matrix model with generative priors

Benjamin Aubin, Bruno Loureiro, Antoine Maillard, Florent Krzakala, Lenka Zdeborová

Neural Information Processing SystemsJan-22-2025, 15:32:36 GMT

Using a low-dimensional parametrization of signals is a generic and powerful way to enhance performance in signal processing and statistical inference. A very popular and widely explored type of dimensionality reduction is sparsity; another type is generative modelling of signal distributions. Generative models based on neural networks, such as GANs or variational auto-encoders, are particularly performant and are gaining on applicability. In this paper we study spiked matrix models, where a low-rank matrix is observed through a noisy channel. This problem with sparse structure of the spikes has attracted broad attention in the past literature.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > France (0.14)
North America > Canada (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Filters

Lenka Zdeborová

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Who is Afraid of Big Bad Minima? Analysis of gradient-flow in spiked matrix-tensor models

The committee machine: Computational to statistical gaps in learning a two-layers neural network

Entropy and mutual information in models of deep neural networks

Who is Afraid of Big Bad Minima? Analysis of gradient-flow in spiked matrix-tensor models

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup

The committee machine: Computational to statistical gaps in learning a two-layers neural network

Entropy and mutual information in models of deep neural networks

The spiked matrix model with generative priors

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup

The spiked matrix model with generative priors