Gradient perturbation: For a parametric function fθ(x) parameterized by θ and loss function L(fθ(x),y), usual mini-batched first-order optimizers update θ using gradients gt = 1 N
–Neural Information Processing Systems
In addition to the notations defined in Sec. Note that we use a slightly different notation compared to the main text, because it is more convenient to deal with empirical distributions rather than samples when relating to the dual formulation later on. Thus,oncewefind the optimal f and g, we can obtain P λ through this primal-dual relationship. Readerscan refer to [59] for further details. Under gradient perturbation, the gradient gt is first clipped in L2 norm byconstant,andthennoisesampledfromN(0,σ2I)isadded.
Neural Information Processing Systems
Feb-9-2026, 03:32:50 GMT
- Technology: