Goto

Collaborating Authors

 neg





cf0d02ec99e61a64137b8a2c3b03e030-Supplemental.pdf

Neural Information Processing Systems

Lemma 5. Let S = (Z1,...,Zn) be a collection ofn independent random variables andΦ be an arbitrary random variable defined on the same probability space. Furthermore, each of these summands has zero mean. Given a deterministic algorithmf, we consider the algorithm that adds Gaussian noise to the predictionsoff: fσ(z,x,R)=f(z,x)+ξ, (151) whereξ N(0,σ2Id). Thefigureinthemiddle repeats the experiment of Figure 1a while making the training algorithm stochastic by randomizing the seed. Table 1: The architecture of the 4-layer convolutional neural network used in MNIST 4 vs 9 classification tasks.


A Proofs of the Main Results

Neural Information Processing Systems

This section describes Stein variational gradient descent (SVGD) by Liu and Wang [19]. The overview is meant as supplementary material for Section 5, where we propose to use SVGD for inferring the DiBS posteriors p(Z | D) and p(Z, Θ | D). In contrast to sampling-based MCMC or optimizationbased variational inference methods, SVGD iteratively transports a fixed set of particles to closely match a target distribution, akin to the gradient descent algorithm in optimization. We refer the reader to Liu and Wang [19] for additional details. Let p(x) with x X be a differentiable density that we want to sample from, e.g., to estimate an expectation.




AdversarialSelf-SupervisedContrastiveLearning

Neural Information Processing Systems

Wevalidate ourmethod, RobustContrastiveLearning(RoCL),onmultiplebenchmarkdatasets, on which itobtains comparable robust accuracyover state-of-the-art supervised adversarial learning methods, and significantly improved robustness against the black boxand unseen types of attacks.


Supplementary Material: Bayesian Metric Learning for Uncertainty Quantification in Image Retrieval

Neural Information Processing Systems

The target, or label, is the value that encodes the information we want to learn. Instead, we express the loss in an equivalent but more verbose way. In the previous Section, we defined the contrastive loss for the entire dataset (14). This intuition is formalized by the following Definition and Proposition. Then the loss, as defined in Eq. 14, can be approximated by using the target in Eq. 20 with L (; D) |D The equality is proven by applying the logic of Eq. 19 two times independently, once for the We highlight that this scaling is linear, and thus is reflected in both first and second-order derivatives.