40bb79c081828bebdc39d65a82367246-Supplemental-Conference.pdf
–Neural Information Processing Systems
Table1: Linearnetwork Layer# Name Layer Inshape Outshape 1 Flatten() (3,32,32) 3072 2 fc1 nn.Linear(3072, 200) 3072 200 3 fc2 nn.Linear(200, 1) 200 1 Fully-connected Network We conduct further experiments on several different fully-connected networks with 4 hidden layers with various activation functions. Our subset is smaller because of the computation limitation when calculating the Gram matrix. Experiments show that the properties along GD trajectory(e.g. We consider simple linear networks, fully-connected networks, convolutional networks in this appendix. The following Figure 4 illustrates the positive correlation between thesharpness andtheA-norm, andtherelationship between theloss D(t) 2 and R(t) 2 alongthetrajectory.
Neural Information Processing Systems
Feb-8-2026, 13:34:47 GMT