Supplement to: Embedding Principle of Loss Landscape of Deep Neural Networks

Aug-15-2025, 09:50:03 GMT–Neural Information Processing Systems

However, this transform does not inform about the degeneracy of critical points/manifolds. Clearly, this transform is also a critical transform. For the 1D fitting experiments (Figs. 1, 3(a), 4), we use tanh as the activation function, mean squared We use the full-batch gradient descent with learning rate 0.005 to We use the default Adam optimizer of full batch with learning rate 0.02 to train for We also use the default Adam optimizer of full batch with learning rate 0.00003 Their output functions are shown in the figure. Remark that, although Figs. 1 and 5 are case studies each based on a random trial, similar phenomenon Do the main claims made in the abstract and introduction accurately reflect the paper's Did you state the full set of assumptions of all theoretical results? Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] In the Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)?

embedding principle, fully-connected neural network, manifold, (14 more...)

Neural Information Processing Systems

Aug-15-2025, 09:50:03 GMT

Conferences PDF

Add feedback

Country:
- Asia > China > Shanghai > Shanghai (0.05)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Duplicate Docs Excel Report

Title
7cc532d783a7461f227a5da8ea80bfe1-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found