Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations
Training-time safety violations have been a major concern when we deploy reinforcement learning algorithms in the real world. This paper explores the possibility of safe RL algorithms with zero training-time safety violations in the challenging setting where we are only given a safe but trivial-reward initial policy without any prior knowledge of the dynamics and additional offline data. We propose an algorithm, Co-trained Barrier Certificate for Safe RL (CRABS),which iteratively learns barrier certificates, dynamics models, and policies. The barrier certificates are learned via adversarial training and ensure the policy's safety assuming calibrated learned dynamics. We also add a regularization term to encourage larger certified regions to enable better exploration. Empirical simulations show that zero safety violations are already challenging for a suite of simple environments with only 2-4 dimensional state space, especially if high-reward policies have to visit regions near the safety boundary. Prior methods require hundreds of violations to achieve decent rewards on these tasks, whereas our proposed algorithms incur zero violations.
FedSR: A Simple and Effective Domain Generalization Method for Federated Learning
Federated Learning (FL) refers to the decentralized and privacy-preserving machine learning framework in which multiple clients collaborate (with the help of a central server) to train a global model without sharing their data. However, most existing FL methods only focus on maximizing the model's performance on the source clients' data (e.g., mobile users) without considering its generalization ability to unknown target data (e.g., a new user). In this paper, we incorporate the problem of Domain Generalization (DG) into Federated Learning to tackle the aforementioned issue. However, virtually all existing DG methods require a centralized setting where data is shared across the domains, which violates the principles of decentralized FL and hence not applicable. To this end, we propose a simple yet novel representation learning framework, namely FedSR, which enables domain generalization while still respecting the decentralized and privacy-preserving natures of this FL setting. Motivated by classical machine learning algorithms, we aim to learn a simple representation of the data for better generalization. In particular, we enforce an L2-norm regularizer on the representation and a conditional mutual information (between the representation and the data given the label) regularizer to encourage the model to only learn essential information (while ignoring spurious correlations such as the background). Furthermore, we provide theoretical connections between the above two objectives and representation alignment in domain generalization. Extensive experimental results suggest that our method significantly outperforms relevant baselines in this particular problem.
Estimating Generic 3D Room Structures from 2D Annotations Stefan Popov 1
Indoor rooms are among the most common use cases in 3D scene understanding. Current state-of-the-art methods for this task are driven by large annotated datasets. Room layouts are especially important, consisting of structural elements in 3D, such as wall, floor, and ceiling. However, they are difficult to annotate, especially on pure RGB video. We propose a novel method to produce generic 3D room layouts just from 2D segmentation masks, which are easy to annotate for humans. Based on these 2D annotations, we automatically reconstruct 3D plane equations for the structural elements and their spatial extent in the scene, and connect adjacent elements at the appropriate contact edges. We annotate and publicly release 2246 3D room layouts on the RealEstate10k dataset, containing YouTube videos. We demonstrate the high quality of these 3D layouts annotations with extensive experiments.
A A Note on Applications and Future Work
The applications of HRRs may not be immediate, given the approach has been out-of-vogue amongst most machine learning practitioners for many years. Long term we believe improvements in neurosymbolic learning are important for better generalization of ML methods to novel inputs and situations, as argued by [3]. In the short term future, we do believe HRRs may have considerable opportunity to provide enhancements. Transformers via their "query, key, value" Multi-Headed Attention (MHA) are a natural place to explore HRRs due to the match of logical design, while potentially avoiding MHA's high costs and are supported by similarly motivated analysis by Schlag et al. [4] through the lens of an associative memory. The same inspiration and other neuro-symbolic work on question-answering with TPRs [6] leads us to believe HRRs maybe have similar potential for such systems, and in particular as a way to extract, or augment the knowledge base of an a queryable system in a way that current methods do not yet allow.
Rรฉnyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning
Propose-Test-Release (PTR) is a differential privacy framework that works with local sensitivity of functions, instead of their global sensitivity. This framework is typically used for releasing robust statistics such as median or trimmed mean in a differentially private manner. While PTR is a common framework introduced over a decade ago, using it in applications such as robust SGD where we need many adaptive robust queries is challenging. This is mainly due to the lack of Rรฉnyi Differential Privacy (RDP) analysis, an essential ingredient underlying the moments accountant approach for differentially private deep learning. In this work, we generalize the standard PTR and derive the first RDP bound for it when the target function has bounded global sensitivity.
Information Theoretic Lower Bounds for Information Theoretic Upper Bounds
We examine the relationship between the mutual information between the output model and the empirical sample and the generalization of the algorithm in the context of stochastic convex optimization. Despite increasing interest in informationtheoretic generalization bounds, it is uncertain if these bounds can provide insight into the exceptional performance of various learning algorithms. Our study of stochastic convex optimization reveals that, for true risk minimization, dimensiondependent mutual information is necessary. This indicates that existing informationtheoretic generalization bounds fall short in capturing the generalization capabilities of algorithms like SGD and regularized ERM, which have dimension-independent sample complexity.
Information Theoretic Lower Bounds for Information Theoretic Upper Bounds
We examine the relationship between the mutual information between the output model and the empirical sample and the generalization of the algorithm in the context of stochastic convex optimization. Despite increasing interest in informationtheoretic generalization bounds, it is uncertain if these bounds can provide insight into the exceptional performance of various learning algorithms. Our study of stochastic convex optimization reveals that, for true risk minimization, dimensiondependent mutual information is necessary. This indicates that existing informationtheoretic generalization bounds fall short in capturing the generalization capabilities of algorithms like SGD and regularized ERM, which have dimension-independent sample complexity.
Supplementary Material: Towards Improving Calibration in Object Detection Under Domain Shift Muhammad Haris Khan
In this supplementary material, we first present the following (additional) results of our calibration techniques with: another calibration metric, a recent transformer-based object detector, and a recent domain-adaptive detector. Finally, we show some qualitative results of our proposed train-time calibration loss and describe the implementation details for different detectors considered. Where error(m) denotes the average error in a bin and uncertainty(m) represents the average uncertainty in a bin. We see that our calibration techniques, when either used individually or as a combination, can not only decrease the D-ECE but are also capable of reducing the D-UCE. Calibration performance in terms of detection expected uncertainty calibration error (D-UCE).
Towards Improving Calibration in Object Detection Under Domain Shift, Muhammad Haris Khan
With deep neural network based solution more readily being incorporated in realworld applications, it has been pressing requirement that predictions by such models, especially in safety-critical environments, be highly accurate and wellcalibrated. Although some techniques addressing DNN calibration have been proposed, they are only limited to visual classification applications and in-domain predictions. Unfortunately, very little to no attention is paid towards addressing calibration of DNN-based visual object detectors, that occupy similar space and importance in many decision making systems as their visual classification counterparts. In this work, we study the calibration of DNN-based object detection models, particularly under domain shift. To this end, we first propose a new, plug-and-play, train-time calibration loss for object detection (coined as TCD). It can be used with various application-specific loss functions as an auxiliary loss function to improve detection calibration. Second, we devise a new implicit technique for improving calibration in self-training based domain adaptive detectors, featuring a new uncertainty quantification mechanism for object detection. We demonstrate TCD is capable of enhancing calibration with notable margins (1) across different DNN-based object detection paradigms both in in-domain and out-of-domain predictions, and (2) in different domain-adaptive detectors across challenging adaptation scenarios. Finally, we empirically show that our implicit calibration technique can be used in tandem with TCD during adaptation to further boost calibration in diverse domain shift scenarios.