Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training
–Neural Information Processing Systems
Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalance is based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of different layers in trained models. We demonstrate the efficacy of using HT-SR-motivated metrics to guide the scheduling and balancing of temperature across all network layers during model training, resulting in improved performance during testing.
Neural Information Processing Systems
May-25-2025, 11:36:34 GMT
- Country:
- Europe
- Switzerland > Zürich
- Zürich (0.14)
- United Kingdom > England (0.14)
- Switzerland > Zürich
- Europe
- Genre:
- Research Report > New Finding (1.00)