Generalization Bound of Gradient Flow through Training Trajectory and Data-dependent Kernel

Neural Information Processing Systems 

Gradient-based optimization methods have shown remarkable empirical success, yet their theoretical generalization properties remain only partially understood. In this paper, we establish a generalization bound for gradient flow that aligns with the classical Rademacher complexity bounds for kernel methods--specifically those based on the RKHS norm and kernel trace--through a data-dependent kernel called the loss path kernel (LPK).