Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models

Jan-18-2025, 02:41:28 GMT–Neural Information Processing Systems

Despite their prevalence in deep-learning communities, over-parameterized models convey high demands of computational costs for proper training. This work studies the fine-grained, modular-level learning dynamics of over-parameterized models to attain a more efficient and fruitful training strategy. Empirical evidence reveals that when scaling down into network modules, such as heads in self-attention models, we can observe varying learning patterns implicitly associated with each module's trainability. To describe such modular-level learning capabilities, we introduce a novel concept dubbed modular neural tangent kernel (mNTK), and we demonstrate that the quality of a module's learning is tightly associated with its mNTK's principal eigenvalue \lambda_{\max} . A large \lambda_{\max} indicates that the module learns features with better convergence, while those miniature ones may impact generalization negatively.

modular adaptive training, module, over-parameterized model, (5 more...)

Neural Information Processing Systems

Jan-18-2025, 02:41:28 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.41)