Goto

Collaborating Authors

 jacobian matrix


Supplementary Material for LEPARD: Learning Explicit Part Discovery for 3D Articulated Shape Reconstruction

Neural Information Processing Systems

In this section, we provide detailed derivation for the kinematics proposed in the main paper. The numbers in () indicate the dimension of output features. S is a shape matrix that we set to the identity matrix I in LEP ARD since we use one-to-one mapping for the local deformation estimation. Finally, we obtain a pseudo ground-truth object silhouette G by thresholding the minimum feature distance to the center of the clusters. In Figure 1, we provide the architecture of the encoder-decoder model proposed in the main paper.


Fast Convergence of Natural Gradient Descent for Over-Parameterized Neural Networks

Neural Information Processing Systems

Natural gradient descent has proven very effective at mitigating the catastrophic effects of pathological curvature in the objective function, but little is known theoretically about its convergence properties, especially for \emph{non-linear} networks. In this work, we analyze for the first time the speed of convergence to global optimum for natural gradient descent on non-linear neural networks with the squared error loss. We identify two conditions which guarantee the global convergence: (1) the Jacobian matrix (of network's output for all training cases w.r.t the parameters) is full row rank and (2) the Jacobian matrix is stable for small perturbations around the initialization. For two-layer ReLU neural networks (i.e. with one hidden layer), we prove that these two conditions do hold throughout the training under the assumptions that the inputs do not degenerate and the network is over-parameterized. We further extend our analysis to more general loss function with similar convergence property. Lastly, we show that K-FAC, an approximate natural gradient descent method, also converges to global minima under the same assumptions.


On the Stability of the Jacobian Matrix in Deep Neural Networks

Dadoun, Benjamin, Hayou, Soufiane, Salam, Hanan, Seddik, Mohamed El Amine, Youssef, Pierre

arXiv.org Artificial Intelligence

Deep neural networks are known to suffer from exploding or vanishing gradients as depth increases, a phenomenon closely tied to the spectral behavior of the input-output Jacobian. Prior work has identified critical initialization schemes that ensure Jacobian stability, but these analyses are typically restricted to fully connected networks with i.i.d. weights. In this work, we go significantly beyond these limitations: we establish a general stability theorem for deep neural networks that accommodates sparsity (such as that introduced by pruning) and non-i.i.d., weakly correlated weights (e.g. induced by training). Our results rely on recent advances in random matrix theory, and provide rigorous guarantees for spectral stability in a much broader class of network models. This extends the theoretical foundation for initialization schemes in modern neural networks with structured and dependent randomness.