Supplement to: Embedding Principle of Loss Landscape of Deep Neural Networks
–Neural Information Processing Systems
However, this transform does not inform about the degeneracy of critical points/manifolds. Clearly, this transform is also a critical transform. For the 1D fitting experiments (Figs. 1, 3(a), 4), we use tanh as the activation function, mean squared We use the full-batch gradient descent with learning rate 0.005 to We use the default Adam optimizer of full batch with learning rate 0.02 to train for We also use the default Adam optimizer of full batch with learning rate 0.00003 Their output functions are shown in the figure. Remark that, although Figs. 1 and 5 are case studies each based on a random trial, similar phenomenon Do the main claims made in the abstract and introduction accurately reflect the paper's Did you state the full set of assumptions of all theoretical results? Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] In the Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)?
Neural Information Processing Systems
Aug-15-2025, 09:50:03 GMT
- Technology: