Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks

Open in new window