2281f5c898351dbc6dace2ba201e7948-AuthorFeedback.pdf

Feb-11-2026, 16:57:16 GMT–Neural Information Processing Systems

Fromanoptimization5 perspective, as the reviewer pointed out, we use a preconditioning matrix to change the curvature and reduce the6 condition number of the optimization problem. Weusethemetaphor ofstatistical strengthtoreferthat,bytaking15 into account the correlations between data/gradients, we improve the effective sample size. From an optimization16 viewpoint, reducing the number of hidden features will not help optimization since the condition number can still17 be very large. To address the raised concerns, we performed additional experiments using only 15 hidden units18 in the last fully connected layer (the original implementation has 50 hidden units) on MNIST with batch size 256.19 {Regularizing_Type/Hidden_Dim} with {L2/50}, {L2/15}, {AdaReg/50}, and {AdaReg/15} are97.53%, On the MNIST dataset, for most of the methods except AdaReg and28 BatchNorm, we do observe that smaller batch size leads to better generalizations.

condition number, optimization, reviewer, (3 more...)

Neural Information Processing Systems

Feb-11-2026, 16:57:16 GMT

Conferences PDF

Add feedback

Duplicate Docs Excel Report

Title
. We would like to point out that

Similar Docs Excel Report more

Title	Similarity	Source
None found