Goto

Collaborating Authors

 kernel loss




Reviews: A Kernel Loss for Solving the Bellman Equation

Neural Information Processing Systems

Originality: The derivation of the loss function is original; the resulting loss function has some close similarities with the coupled formulation of LSTD, which should be discussed. Quality: The claims seem to be accurate (I briefly verified the proofs of Theorem 3.1, Proposition 3.3, Proposition 3.4; I did not verify Theorem 3.2 and Corollary 3.5). Clarity: The paper is well-written and clear. Significance: The addressed problem is important; the insights are also useful. SUMMARY: The paper addresses the problem of designing a new loss function for RL.


Reviews: A Kernel Loss for Solving the Bellman Equation

Neural Information Processing Systems

There is general consensus that the idea introduced in the paper is novel and interesting. Yet, I encourage the authors to read carefully the reviewers' comments and take them into consideration in the camera ready. In particular, the connection with the nested formulation of LSTD should be discussed to frame the contribution of the paper better.


A Kernel Loss for Solving the Bellman Equation

Neural Information Processing Systems

Value function learning plays a central role in many state-of-the-art reinforcement learning algorithms. Many popular algorithms like Q-learning do not optimize any objective function, but are fixed-point iterations of some variants of Bellman operator that are not necessarily a contraction. As a result, they may easily lose convergence guarantees, as can be observed in practice. In this paper, we propose a novel loss function, which can be optimized using standard gradient-based methods with guaranteed convergence. The key advantage is that its gradient can be easily approximated using sampled transitions, avoiding the need for double samples required by prior algorithms like residual gradient.


A Kernel Loss for Solving the Bellman Equation

Neural Information Processing Systems

Value function learning plays a central role in many state-of-the-art reinforcement learning algorithms. Many popular algorithms like Q-learning do not optimize any objective function, but are fixed-point iterations of some variants of Bellman operator that are not necessarily a contraction. As a result, they may easily lose convergence guarantees, as can be observed in practice. In this paper, we propose a novel loss function, which can be optimized using standard gradient-based methods with guaranteed convergence. The key advantage is that its gradient can be easily approximated using sampled transitions, avoiding the need for double samples required by prior algorithms like residual gradient.


A Kernel Loss for Solving the Bellman Equation

arXiv.org Machine Learning

Value function learning plays a central role in many state-of-the-art reinforcement-learning algorithms. Many popular algorithms like Q-learning do not optimize any objective function, but are fixed-point iterations of some variant of Bellman operator that is not necessarily a contraction. As a result, they may easily lose convergence guarantees, as can be observed in practice. In this paper, we propose a novel loss function, which can be optimized using standard gradient-based methods without risking divergence. The key advantage is that its gradient can be easily approximated using sampled transitions, avoiding the need for double samples required by prior algorithms like residual gradient. Our approach may be combined with general function classes such as neural networks, on either on- or off-policy data, and is shown to work reliably and effectively in several benchmarks.


Online learning with kernel losses

arXiv.org Machine Learning

We present a generalization of the adversarial linear bandits framework, where the underlying losses are kernel functions (with an associated reproducing kernel Hilbert space) rather than linear functions. We study a version of the exponential weights algorithm and bound its regret in this setting. Under conditions on the eigendecay of the kernel we provide a sharp characterization of the regret for this algorithm. When we have polynomial eigendecay $\mu_j \le \mathcal{O}(j^{-\beta})$, we find that the regret is bounded by $\mathcal{R}_n \le \mathcal{O}(n^{\beta/(2(\beta-1))})$; while under the assumption of exponential eigendecay $\mu_j \le \mathcal{O}(e^{-\beta j })$, we get an even tighter bound on the regret $\mathcal{R}_n \le \mathcal{O}(n^{1/2}\log(n)^{1/2})$. We also study the full information setting when the underlying losses are kernel functions and present an adapted exponential weights algorithm and a conditional gradient descent algorithm.