Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization

May-27-2025, 06:38:21 GMT–Neural Information Processing Systems

Can we modify the training data distribution to encourage the underlying optimization method toward finding solutions with superior generalization performance on in-distribution data? In this work, we approach this question for the first time by comparing the inductive bias of gradient descent (GD) with that of sharpness-aware minimization (SAM). By studying a two-layer CNN, we rigorously prove that SAM learns different features more uniformly, particularly in early epochs. That is, SAM is less susceptible to simplicity bias compared to GD. We also show that examples constraining features that are learned early are separable from the rest based on the model's output.

generalization performance, in-distribution generalization, training data distribution, (3 more...)

Neural Information Processing Systems

May-27-2025, 06:38:21 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)