Goto

Collaborating Authors

 Gradient Descent





A Adaptations of Algorithm 1 for different problems

Neural Information Processing Systems

We extend Algorithm 1 to stochastic gradient descent (SGD). Algorithm 3 here modifies Algorithm 1 to allow transformations on both parameters and data. In this section, we derive the group actions for the test functions and multi-layer neural networks. More details about group theory can be found in textbooks such as Lang (2002). B.1 Continuous symmetry in test functions B.1.1 Ellipse Consider the following loss function with a 2 R However, we will only use the 2 variable version in the experiments.


Appendix A Gradient Descent and Neural Tangent Kernel Gradient Descent Since we consider the square loss and `

Neural Information Processing Systems

We provide here a brief overview of reproducing kernel Hilbert space (RKHS). More details can be found in Appendix G.2. In this work, we impose the following assumptions. Remark 5. Assumption D.3 can be replaced by an alternative assumption, that is, Assumption D.1 is related to the neural network and GD training, where similar settings have been Assumption D.2 imposes conditions on the underlying true conditional probability in the non-separable case. This assumption basically requires that the conditional probability is within the function class generated by the GD-trained neural networks we consider (thus can be calibrated).




Checklist 1. For all authors (a)

Neural Information Processing Systems

Do the main claims made in the abstract and introduction accurately reflect the paper's Did you discuss any potential negative societal impacts of your work? Did you state the full set of assumptions of all theoretical results? If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] The code will Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [N/A] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) We trained backdoored model for 100 epochs using Stochastic Gradient Descent (SGD) with an initial learning rate of 0.1 on CIFAR-10 and the ImageNet subset (0.01 on GTSRB), a weight decay of The learning rate was divided by 10 at the 20th and the 70th epochs. The details of backdoor triggers are summarized in Table 5. ASR: attack success rate; CA: clean accuracy.


Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent

Neural Information Processing Systems

What is the information leakage of an iterative randomized learning algorithm about its training data, when the internal state of the algorithm is private?