Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks