Europe
Bellman Residual Orthogonalization for Offline Reinforcement Learning Anonymous Author(s) Affiliation Address email
We propose and analyze a reinforcement learning principle that approximates the1 Bellman equations by enforcing their validity only along an user-defined space of2 test functions. Focusing on applications to model-free offline RL with function3 approximation, we exploit this principle to derive confidence intervals for off-policy4 evaluation, as well as to optimize over policies within a prescribed policy class.5 We prove an oracle inequality on our policy optimization procedure in terms of6 a trade-off between the value and uncertainty of an arbitrary comparator policy.7 Different choices of test function spaces allow us to tackle different problems8 within a common framework. We characterize the loss of efficiency in moving9 from on-policy to off-policy data using our procedures, and establish connections10 to concentrability coefficients studied in past work. We examine in depth the11 implementation of our methods with linear function approximation, and provide12 theoretical guarantees with polynomial-time implementations even when Bellman13 closure does not hold.14
Bellman Residual Orthogonalization for Offline Reinforcement Learning
We propose and analyze a reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along an user-defined space of test functions. Focusing on applications to model-free offline RL with function approximation, we exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class. We prove an oracle inequality on our policy optimization procedure in terms of a trade-off between the value and uncertainty of an arbitrary comparator policy. Different choices of test function spaces allow us to tackle different problems within a common framework. We characterize the loss of efficiency in moving from on-policy to off-policy data using our procedures, and establish connections to concentrability coefficients studied in past work. We examine in depth the implementation of our methods with linear function approximation, and provide theoretical guarantees with polynomial-time implementations even when Bellman closure does not hold.
0fe6a94848e5c68a54010b61b3e94b0e-Supplemental.pdf
Post-hoc gradient-based interpretability methods [1, 2] that provide instancespecific explanations of model predictions are often based on assumption (A): magnitude of input gradients--gradients of logits with respect to input--noisily highlight discriminative task-relevant features. In this work, we test the validity of assumption (A) using a three-pronged approach: 1. We develop an evaluation framework, DiffROAR, to test assumption (A) on four image classification benchmarks. Our results suggest that (i) input gradients of standard models (i.e., trained on original data) may grossly violate (A), whereas (ii) input gradients of adversarially robust models satisfy (A) reasonably well.
146b4bab3f8536a07905f25d367b4924-Paper-Conference.pdf
Tree-based models are used in many high-stakes application domains such as finance and medicine, where robustness and interpretability are of utmost importance. Yet, methods for improving and certifying their robustness are severely under-explored, in contrast to those focusing on neural networks. Targeting this important challenge, we propose deterministic smoothing for decision stump ensembles. Whereas most prior work on randomized smoothing focuses on evaluating arbitrary base models approximately under input randomization, the key insight of our work is that decision stump ensembles enable exact yet efficient evaluation via dynamic programming. Importantly, we obtain deterministic robustness certificates, even jointly over numerical and categorical features, a setting ubiquitous in the real world. Further, we derive an MLE-optimal training method for smoothed decision stumps under randomization and propose two boosting approaches to improve their provable robustness. An extensive experimental evaluation on computer vision and tabular data tasks shows that our approach yields significantly higher certified accuracies than the state-of-the-art for tree-based models. We release all code and trained models at https://github.com/eth-sri/drs.
INDIGO: GNN-Based Inductive Knowledge Graph Completion Using Pair-Wise Encoding
The aim of knowledge graph (KG) completion is to extend an incomplete KG with missing triples. Popular approaches based on graph embeddings typically work by first representing the KG in a vector space, and then applying a predefined scoring function to the resulting vectors to complete the KG. These approaches work well in transductive settings, where predicted triples involve only constants seen during training; however, they are not applicable in inductive settings, where the KG on which the model was trained is extended with new constants or merged with other KGs. The use of Graph Neural Networks (GNNs) has recently been proposed as a way to overcome these limitations; however, existing approaches do not fully exploit the capabilities of GNNs and still rely on heuristics and adhoc scoring functions. In this paper, we propose a novel approach, where the KG is fully encoded into a GNN in a transparent way, and where the predicted triples can be read out directly from the last layer of the GNN without the need for additional components or scoring functions. Our experiments show that our model outperforms state-of-the-art approaches on inductive KG completion benchmarks.
Beatbot Pool-Cleaning Robots Are on Sale for a Limited Time
Get ready for summer with discounts on robot pool cleaners from Beatbot. National Pool Opening Day is tomorrow, April 25, and summer is almost here, which means pool owners everywhere are getting ready to unveil the horrors of whatever happened during the off-season. Most of the Beatbot lineup is on sale at Amazon and Beatbot's own storefront, with prices starting at $499. Beatbot makes many of the best pool-cleaning robots we've tested, and we've highlighted our top picks below. Note that the discounts are scheduled to end on April 26, though items may sell out sooner.