Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows
Yoo, Gene Ryan, Owhadi, Houman
We introduce a new regularization method for Artificial Neural Networks (ANNs) based on Kernel Flows (KFs). The proposed method simply consists in aggregating (as a weighted sum) a subset of these KF losses with a classical output loss (e.g. We test the proposed method on Convolutional Neural Networks (CNNs) and Wide Residual Networks (WRNs) without alteration of their structure nor their output classifier and report reduced test errors, decreased generalization gaps, and increased robustness to distribution shift without significant increase in computational complexity relative to standard CNN and WRN training (with Drop Out and Batch Normalization). We suspect that these results might be explained by the fact that while conventional training only employs a linear functional (a generalized moment) of the empirical distribution defined by the dataset and can be prone to trapping in the Neural Tangent Kernel regime (under over-parameterizations), the proposed loss function (defined as a nonlinear functional of the empirical distribution) effectively trains the underlying kernel defined by the CNN beyond regressing the data with that kernel. Kernel Flows were introduced in [8] as a method for kernel selection/design in Kriging/Gaussian Process Regression (GPR). Any non-degenerate kernel Kpx, x 1 q can be used to approximate u: with the interpolant upxq " Kpx, XqKpX, Xq 1 Y, (1.1) writing Y:" py 1,..., y N q T, X:" px 1,..., x N q, KpX, Xq for the N ˆ N Gram matrix Kpx i, x i q and Kpx, Xq for the N dimensional vector with entries Kpx, x i q.
Feb-19-2020
- Country:
- North America > United States
- California > Los Angeles County > Pasadena (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Genre:
- Research Report (0.50)
- Technology: