AITopics | saddle

Predictive coding (PC) is an energy-based learning algorithm that performs iterative inference over network activities before updating weights. Recent work suggests that PC can converge in fewer learning steps than backpropagation thanks to its inference procedure. However, these advantages are not always observed, and the impact of PC inference on learning is not theoretically well understood. To address this gap, we study the geometry of the PC weight landscape at the inference equilibrium of the network activities. For deep linear networks, we first show that the equilibrated PC energy is equal to a rescaled mean squared error loss with a weight-dependent rescaling. We then prove that many highly degenerate (non-strict) saddles of the loss including the origin become much easier to escape (strict) in the equilibrated energy. Experiments on both linear and non-linear networks strongly validate our theory and further suggest that all the saddles of the equilibrated energy are strict. Overall, this work shows that PC inference makes the loss landscape of feedforward networks more benign and robust to vanishing gradients, while also highlighting the fundamental challenge of scaling PC to very deep models.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

Saddle-to-Saddle Dynamics in Diagonal Linear Networks Anonymous Author(s) Affiliation Address email

Neural Information Processing SystemsFeb-8-2026, 08:35:16 GMT

The main result is informally presented here.

artificial intelligence, iterate, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Saddle-to-Saddle Dynamics in Diagonal Linear Networks

Neural Information Processing SystemsFeb-8-2026, 08:35:12 GMT

The main result is informally presented here.

artificial intelligence, iterate, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Detecting Invariant Manifolds in ReLU-Based RNNs

Eisenmann, Lukas, Brändle, Alena, Monfared, Zahra, Durstewitz, Daniel

arXiv.org Artificial IntelligenceDec-4-2025

Recurrent Neural Networks (RNNs) have found widespread applications in machine learning for time series prediction and dynamical systems reconstruction, and experienced a recent renaissance with improved training algorithms and architectural designs. Understanding why and how trained RNNs produce their behavior is important for scientific and medical applications, and explainable AI more generally. An RNN's dynamical repertoire depends on the topological and geometrical properties of its state space. Stable and unstable manifolds of periodic points play a particularly important role: They dissect a dynamical system's state space into different basins of attraction, and their intersections lead to chaotic dynamics with fractal geometry. Here we introduce a novel algorithm for detecting these manifolds, with a focus on piecewise-linear RNNs (PLRNNs) employing rectified linear units (ReLUs) as their activation function. We demonstrate how the algorithm can be used to trace the boundaries between different basins of attraction, and hence to characterize multistability, a computationally important property. We further show its utility in finding so-called homoclinic points, the intersections between stable and unstable manifolds, and thus establish the existence of chaos in PLRNNs. Finally we show for an empirical example, electrophysiological recordings from a cortical neuron, how insights into the underlying dynamics could be gained through our method.

artificial intelligence, machine learning, manifold, (19 more...)

arXiv.org Artificial Intelligence

2510.03814

Country: Europe (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Bregman Divergence for Stochastic Variance Reduction: Saddle-Point and Adversarial Prediction

Zhan Shi, Xinhua Zhang, Yaoliang Yu

Neural Information Processing SystemsNov-21-2025, 12:32:26 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, inductive learning, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)

Add feedback

On the Convergence of Overparameterized Problems: Inherent Properties of the Compositional Structure of Neural Networks

de Oliveira, Arthur Castello Branco, Jatkar, Dhruv, Sontag, Eduardo

arXiv.org Artificial IntelligenceNov-14-2025

This paper investigates how the compositional structure of neural networks shapes their optimization landscape and training dynamics. We analyze the gradient flow associated with overparameterized optimization problems, which can be interpreted as training a neural network with linear activations. Remarkably, we show that the global convergence properties can be derived for any cost function that is proper and real analytic. We then specialize the analysis to scalar-valued cost functions, where the geometry of the landscape can be fully characterized. In this setting, we demonstrate that key structural features -- such as the location and stability of saddle points -- are universal across all admissible costs, depending solely on the overparameterized representation rather than on problem-specific details. Moreover, we show that convergence can be arbitrarily accelerated depending on the initialization, as measured by an imbalance metric introduced in this work. Finally, we discuss how these insights may generalize to neural networks with sigmoidal activations, showing through a simple example which geometric and dynamical properties persist beyond the linear case.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2511.0981

Genre: