AITopics | Gradient Descent

Do the main claims made in the abstract and introduction accurately reflect the paper's Did you discuss any potential negative societal impacts of your work? Did you state the full set of assumptions of all theoretical results? If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] The code will Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [N/A] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) We trained backdoored model for 100 epochs using Stochastic Gradient Descent (SGD) with an initial learning rate of 0.1 on CIFAR-10 and the ImageNet subset (0.01 on GTSRB), a weight decay of The learning rate was divided by 10 at the 20th and the 70th epochs. The details of backdoor triggers are summarized in Table 5. ASR: attack success rate; CA: clean accuracy.

artificial intelligence, cifar-10, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Industry: Information Technology (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

999df4ce78b966de17aee1dc87111044-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 11:57:45 GMT

hypernetwork, international conference, neural network, (13 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Andforanyα>0,theLaplaciansatisfies Gλ c1 2α Gλ+2Md whereM =c23α/c1+c22. 2. IfG (x)=kg (x)k2 isC1,then k G (x)k c2kg (x)k,g (x) > G (x) c1G (x). Proof. Claim1.Note Gλ =2 2Fλgλandthat 1 2 c1I 2Fλ= mX

Neural Information Processing SystemsFeb-9-2026, 11:57:32 GMT

As a remark, using the Krylov-Bogoliubov existence theorem (see Corollary 11.8 of [6]), fixed points to(4)exist as long as one can show{ρt,t 0}istight. The learning rate is set differently foreachtask. Obviously, the HV indicator (Eq.(10)) can also be used as an objective function for optimizing solution sets. For example, [25, 7] greedily add new points to obtain the highest expected HV improvement. However, the landscape of the HV indicator is piece-wise constant (similar to the 0-1 loss in classification) and is difficult to optimize with gradient descent. Particularly, for all the dominated points inthe solution set, their gradient iszero.

artificial intelligence, machine learning, seefig, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

65ccdfe02045fa0b823c5fa7ffd56b66-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 11:46:45 GMT

We show the utility of our method by applying it to gradient descent with shuffling and mini-batch gradient descent, reaffirming key results from existing literature under a unified framework.

artificial intelligence, machine learning, markov chain, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

65ae674df2fb642518ae8d2b5435e1b8-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 11:38:40 GMT

assumption, complexity, sample complexity, (17 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive Stepsizes

Neural Information Processing SystemsFeb-9-2026, 11:38:25 GMT

Sharma et al. (2022) provide Y ang et al. (2022a) integrate Local SGDA with stochastic gradient estimators to eliminate the More recently, Zhang et al. (2023) adopt compressed momentum methods with Local SGD to increase the communication efficiency of the algorithm. For centralized nonconvex minimax problems, Y ang et al. (2022b) show that, even in deterministic settings, GDA-based methods necessitate the timescale separation of the stepsizes for primal and dual updates.

artificial intelligence, machine learning, stepsize, (17 more...)

Neural Information Processing Systems

Country: