AITopics | Gradient Descent

We extend Algorithm 1 to stochastic gradient descent (SGD). Algorithm 3 here modifies Algorithm 1 to allow transformations on both parameters and data. In this section, we derive the group actions for the test functions and multi-layer neural networks. More details about group theory can be found in textbooks such as Lang (2002). B.1 Continuous symmetry in test functions B.1.1 Ellipse Consider the following loss function with a 2 R However, we will only use the 2 variable version in the experiments.

eigenvector, largest eigenvalue, teleportation, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Add feedback

Appendix A Gradient Descent and Neural Tangent Kernel Gradient Descent Since we consider the square loss and `

Neural Information Processing SystemsAug-15-2025, 13:28:30 GMT

We provide here a brief overview of reproducing kernel Hilbert space (RKHS). More details can be found in Appendix G.2. In this work, we impose the following assumptions. Remark 5. Assumption D.3 can be replaced by an alternative assumption, that is, Assumption D.1 is related to the neural network and GD training, where similar settings have been Assumption D.2 imposes conditions on the underlying true conditional probability in the non-separable case. This assumption basically requires that the conditional probability is within the function class generated by the GD-trained neural networks we consider (thus can be calibrated).

inequality, probability, square loss, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)

Add feedback

A Theoretical Analysis of Fine-tuning with Linear Teachers

Neural Information Processing SystemsAug-15-2025, 12:40:50 GMT

Fine-tuning is a common practice in deep learning, achieving excellent generalization results on downstream tasks using relatively little training data.

fine-tuning, international conference, target task, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

9f29450d2eb58feb555078bdefe28aa5-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 11:44:57 GMT

graph, placement, policy network, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
(2 more...)

Add feedback

Checklist 1. For all authors (a)

Neural Information Processing SystemsAug-15-2025, 10:09:23 GMT

Do the main claims made in the abstract and introduction accurately reflect the paper's Did you discuss any potential negative societal impacts of your work? Did you state the full set of assumptions of all theoretical results? If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] The code will Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [N/A] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) We trained backdoored model for 100 epochs using Stochastic Gradient Descent (SGD) with an initial learning rate of 0.1 on CIFAR-10 and the ImageNet subset (0.01 on GTSRB), a weight decay of The learning rate was divided by 10 at the 20th and the 70th epochs. The details of backdoor triggers are summarized in Table 5. ASR: attack success rate; CA: clean accuracy.

artificial intelligence, cifar-10, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Industry: Information Technology (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent

Neural Information Processing SystemsAug-15-2025, 09:31:34 GMT

What is the information leakage of an iterative randomized learning algorithm about its training data, when the internal state of the algorithm is private?

algorithm, loss function, privacy loss, (13 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.04)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback