regularization scheme
BeyondTikhonov: FasterLearningwith Self-ConcordantLossesviaIterativeRegularization
The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels. For least squares, it allows to derive various regularization schemes that yield faster convergence rates of the excess risk than with Tikhonov regularization. This is typically achieved by leveraging classical assumptions called source and capacity conditions, which characterize the difficulty of the learning task. In order to understand estimators derived from other lossfunctions,Marteau-Fereyetal.
Beyond Tikhonov: faster learning with self-concordant losses, via iterative regularization
The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels. For least squares, it allows to derive various regularization schemes that yield faster convergence rates of the excess risk than with Tikhonov regularization. This is typically achieved by leveraging classical assumptions called source and capacity conditions, which characterize the difficulty of the learning task. In order to understand estimators derived from other loss functions, Marteau-Ferey et al. have extended the theory of Tikhonov regularization to generalized self concordant loss functions (GSC), which contain, e.g., the logistic loss. In this paper, we go a step further and show that fast and optimal rates can be achieved for GSC by using the iterated Tikhonov regularization scheme, which is intrinsically related to the proximal point method in optimization, and overcomes the limitation of the classical Tikhonov regularization.
Decreasing Entropic Regularization Averaged Gradient for Semi-Discrete Optimal Transport
Genans, Ferdinand, Godichon-Baggioni, Antoine, Vialard, François-Xavier, Wintenberger, Olivier
Adding entropic regularization to Optimal Transport (OT) problems has become a standard approach for designing efficient and scalable solvers. However, regularization introduces a bias from the true solution. To mitigate this bias while still benefiting from the acceleration provided by regularization, a natural solver would adaptively decrease the regularization as it approaches the solution. Although some algorithms heuristically implement this idea, their theoretical guarantees and the extent of their acceleration compared to using a fixed regularization remain largely open. In the setting of semi-discrete OT, where the source measure is continuous and the target is discrete, we prove that decreasing the regularization can indeed accelerate convergence. To this end, we introduce DRAG: Decreasing (entropic) Regularization Averaged Gradient, a stochastic gradient descent algorithm where the regularization decreases with the number of optimization steps. We provide a theoretical analysis showing that DRAG benefits from decreasing regularization compared to a fixed scheme, achieving an unbiased $\mathcal{O}(1/t)$ sample and iteration complexity for both the OT cost and the potential estimation, and a $\mathcal{O}(1/\sqrt{t})$ rate for the OT map. Our theoretical findings are supported by numerical experiments that validate the effectiveness of DRAG and highlight its practical advantages.
- North America > United States (0.04)
- Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
Enhancing the Cross-Size Generalization for Solving Vehicle Routing Problems via Continual Learning
Li, Jingwen, Cao, Zhiguang, Wu, Yaoxin, Liu, Tang
Exploring machine learning techniques for addressing vehicle routing problems has attracted considerable research attention. To achieve decent and efficient solutions, existing deep models for vehicle routing problems are typically trained and evaluated using instances of a single size. This substantially limits their ability to generalize across different problem sizes and thus hampers their practical applicability. To address the issue, we propose a continual learning based framework that sequentially trains a deep model with instances of ascending problem sizes. Specifically, on the one hand, we design an inter-task regularization scheme to retain the knowledge acquired from smaller problem sizes in the model training on a larger size. On the other hand, we introduce an intra-task regularization scheme to consolidate the model by imitating the latest desirable behaviors during training on each size. Additionally, we exploit the experience replay to revisit instances of formerly trained sizes for mitigating the catastrophic forgetting. Experimental results show that our approach achieves predominantly superior performance across various problem sizes (either seen or unseen in the training), as compared to state-of-the-art deep models including the ones specialized for generalizability enhancement. Meanwhile, the ablation studies on the key designs manifest their synergistic effect in the proposed framework.
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- Asia > Singapore (0.04)
Appendix A Source codes
Specifically, we average the scores over 100 episodes evaluated on confounded environments for each random seed. We use Adam optimizer with the learning rate of 3e-4. Note that other regularization baselines are based on BC. In particular, OREO achieves the mean HNS of 114.9%, while Figure 9: We compare OREO to CCIL with environment interaction, on 6 confounded Atari environments. We investigate the possibility of applying OREO to other IL methods.
The Minority Matters: A Diversity-Promoting Collaborative Metric Learning Algorithm
Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and Collaborative Filtering. Following the convention of RS, existing methods exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests. Under this setting, we argue that the unique user representation might induce preference bias, especially when the item category distribution is imbalanced. To address this issue, we propose a novel method called Diversity-Promoting Collaborative Metric Learning (DPCML), with the hope of considering the commonly ignored minority interest of the user.
- Asia > Middle East > Jordan (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Leisure & Entertainment (0.92)
- Information Technology (0.67)
- Media > Film (0.45)
The Minority Matters: A Diversity-Promoting Collaborative Metric Learning Algorithm
Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and Collaborative Filtering. Following the convention of RS, existing methods exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests. Under this setting, we argue that the unique user representation might induce preference bias, especially when the item category distribution is imbalanced. To address this issue, we propose a novel method called Diversity-Promoting Collaborative Metric Learning (DPCML), with the hope of considering the commonly ignored minority interest of the user.
- Asia > Middle East > Jordan (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Information Technology (0.68)
- Media > Film (0.46)
- Education (0.46)
Stabilizing PINNs: A regularization scheme for PINN training to avoid unstable fixed points of dynamical systems
Babic, Milos, Rohrhofer, Franz M., Geiger, Bernhard C.
ABSTRACT It was recently shown that the loss function used for training physics-informed neural networks (PINNs) exhibits local minima at solutions corresponding to fixed points of dynamical systems. In the forward setting, where the PINN is trained to solve initial value problems, these local minima can interfere with training and potentially leading to physically incorrect solutions. Building on stability theory, this paper proposes a regularization scheme that penalizes solutions corresponding to unstable fixed points. Experimental results on four dynamical systems, including the Lotka-V olterra model and the van der Pol oscillator, show that our scheme helps avoiding physically incorrect solutions and substantially improves the training success rate of PINNs. Index T erms-- PINNs, regularization, stability 1. INTRODUCTION Physics-informed neural networks (PINNs, [1]) are among the most prominent instantiations of physics-informed machine learning.
- Europe > Austria > Styria > Graz (0.05)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Who Does What in Deep Learning? Multidimensional Game-Theoretic Attribution of Function of Neural Units
Dixit, Shrey, Fakhar, Kayson, Hadaeghi, Fatemeh, Mineault, Patrick, Kording, Konrad P., Hilgetag, Claus C.
Neural networks now generate text, images, and speech with billions of parameters, producing a need to know how each neural unit contributes to these high-dimensional outputs. Existing explainable-AI methods, such as SHAP, attribute importance to inputs, but cannot quantify the contributions of neural units across thousands of output pixels, tokens, or logits. Here we close that gap with Multiperturbation Shapley-value Analysis (MSA), a model-agnostic game-theoretic framework. By systematically lesioning combinations of units, MSA yields Shapley Modes, unit-wise contribution maps that share the exact dimensionality of the model's output. We apply MSA across scales, from multi-layer perceptrons to the 56-billion-parameter Mixtral-8x7B and Generative Adversarial Networks (GAN). The approach demonstrates how regularisation concentrates computation in a few hubs, exposes language-specific experts inside the LLM, and reveals an inverted pixel-generation hierarchy in GANs. Together, these results showcase MSA as a powerful approach for interpreting, editing, and compressing deep neural networks.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > California > Alameda County > Oakland (0.04)
- (7 more...)