Goto

Collaborating Authors

 fashion-mnist






vs AllClassifier

Neural Information Processing Systems

We assume, there exists a C Hsuchthat ˆp(y|x) =p(y|x),i.e.,p(y|x)isrealizable; 3. We can compute an empirical risk minimizer, i.e., we can determine aCS H which minimizes(1)foragivensampleS. In the bottom row example, an experiment analogous to the top row but with a noisier version ofthe data is presented. In practice this requires the inclusion of a smallε > 0 in the denominator to circumvent numerical problems in the logarithmic loss terms. In our experiments this also results in a high aleatoric uncertaintyfarawayfrom thein-distribution asallestimated probabilities uniformly take the lower bound's valueε. InthetoproweachGaussian has its own class assigned, resulting in 9 classes and in the bottom row disconnected Gaussians were assigned to 3 classes in total.



Gradient perturbation: For a parametric function fθ(x) parameterized by θ and loss function L(fθ(x),y), usual mini-batched first-order optimizers update θ using gradients gt = 1 N

Neural Information Processing Systems

In addition to the notations defined in Sec. Note that we use a slightly different notation compared to the main text, because it is more convenient to deal with empirical distributions rather than samples when relating to the dual formulation later on. Thus,oncewefind the optimal f and g, we can obtain P λ through this primal-dual relationship. Readerscan refer to [59] for further details. Under gradient perturbation, the gradient gt is first clipped in L2 norm byconstant,andthennoisesampledfromN(0,σ2I)isadded.




_NIPS21__GPATE

Yunhui Long

Neural Information Processing Systems

Figure 3 inthesame 2 contain =, = 5; by G-PA =1, = 10 5. Require:inputx, thresholdT , noiseparameters 1 and 2 1: ifmaxi{nj(x)}+N(0, 21) T then 2: Return:argmax{nj(x)+ N(0, 22)} 3: else 4: Return:?