max null 0
A Closed form expressions for the robust risks
In Section A.1 and A.2 we derive closed-form expressions of the standard and robust risks from We first prove Equation (13). We now prove the second part of the statement. In this section we provide additional details on our experiments. B.1 Neural networks on sanitized binary MNIST If not mentioned otherwise, we use noiseless i.i.d. C.1 we give an intuitive explantion for the robust overfitting phenomenon described in C.2 we discuss how inconsistent adversarial training prevents We now shed light on the phenomena revealed by Theorem 3.1 and Figure 2. In particular, we In this section we further discuss robust logistic regression studied in Section 4. As observed in Section 4.4, label noise can prevent interpolation and hence improve the robust risk Hence, inconsistent training perturbations can induce spurious regularization effects.
A Closed form expressions for the robust risks
In Section A.1 and A.2 we derive closed-form expressions of the standard and robust risks from We first prove Equation (13). We now prove the second part of the statement. In this section we provide additional details on our experiments. B.1 Neural networks on sanitized binary MNIST If not mentioned otherwise, we use noiseless i.i.d. C.1 we give an intuitive explantion for the robust overfitting phenomenon described in C.2 we discuss how inconsistent adversarial training prevents We now shed light on the phenomena revealed by Theorem 3.1 and Figure 2. In particular, we In this section we further discuss robust logistic regression studied in Section 4. As observed in Section 4.4, label noise can prevent interpolation and hence improve the robust risk Hence, inconsistent training perturbations can induce spurious regularization effects.
Generalized Kernelized Bandits: Self-Normalized Bernstein-Like Dimension-Free Inequality and Regret Bounds
Metelli, Alberto Maria, Drago, Simone, Mussi, Marco
We study the regret minimization problem in the novel setting of generalized kernelized bandits (GKBs), where we optimize an unknown function $f^*$ belonging to a reproducing kernel Hilbert space (RKHS) having access to samples generated by an exponential family (EF) noise model whose mean is a non-linear function $μ(f^*)$. This model extends both kernelized bandits (KBs) and generalized linear bandits (GLBs). We propose an optimistic algorithm, GKB-UCB, and we explain why existing self-normalized concentration inequalities do not allow to provide tight regret guarantees. For this reason, we devise a novel self-normalized Bernstein-like dimension-free inequality resorting to Freedman's inequality and a stitching argument, which represents a contribution of independent interest. Based on it, we conduct a regret analysis of GKB-UCB, deriving a regret bound of order $\widetilde{O}( γ_T \sqrt{T/κ_*})$, being $T$ the learning horizon, $γ_T$ the maximal information gain, and $κ_*$ a term characterizing the magnitude the reward nonlinearity. Our result matches, up to multiplicative constants and logarithmic terms, the state-of-the-art bounds for both KBs and GLBs and provides a unified view of both settings.
Semi-gradient DICE for Offline Constrained Reinforcement Learning
Kim, Woosung, Seo, JunHo, Lee, Jongmin, Lee, Byung-Jun
Stationary Distribution Correction Estimation (DICE) addresses the mismatch between the stationary distribution induced by a policy and the target distribution required for reliable off-policy evaluation (OPE) and policy optimization. DICE-based offline constrained RL particularly benefits from the flexibility of DICE, as it simultaneously maximizes return while estimating costs in offline settings. However, we have observed that recent approaches designed to enhance the offline RL performance of the DICE framework inadvertently undermine its ability to perform OPE, making them unsuitable for constrained RL scenarios. In this paper, we identify the root cause of this limitation: their reliance on a semi-gradient optimization, which solves a fundamentally different optimization problem and results in failures in cost estimation. Building on these insights, we propose a novel method to enable OPE and constrained RL through semi-gradient DICE. Our method ensures accurate cost estimation and achieves state-of-the-art performance on the offline constrained RL benchmark, DSRL.
Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features
Veiga, Rodrigo, Remizova, Anastasia, Macris, Nicolas
In supervised learning of neural networks and regression models, understanding the dynamics of optimization algorithms, and in particular stochastic gradient descent (SGD), is of utmost importance. However, despite much progress in a number of directions, this still remains a highly challenging theoretical problem. A fruitful approach that allows making analytical progress consists of suitably approximating SGD by a continuous time approximation, henceforth called stochastic gradient flow (SGF). In this contribution, we build up on this approach, to develop a general formalism characterizing the dynamics of the stochastic process, and apply it to the investigation of the test risk (or generalization error) as a function of time. As is well known, the classical bias-variance trade-off has been challenged in a number of models displaying the double descent phenomenon [1, 2, 3]. Analytical derivations of double descent curves have been achieved for relatively simple models, but are limited to the use of least squares estimators (no dynamics) and pure gradient flow (GF) approximations of gradient descent (GD). The present work goes one step further by investigating the effects of stochasticity on the double descent curve. Our main contributions are summarized as follows: C1 We consider a general Itô stochastic differential equation (SDE) and represent the Markovian transition probability as a path integral, Eq. (12). A general'explicit' formula for the transition probability, Eq. (18), is derived in the limit of a small learning rate by using a Laplace approximation.