AITopics | Gradient Descent

The supplementary file is organized as follows: Section 1 restates the assumptions and main theorems on the convergence of parameter iterates and the full gradient; Section 2 is devoted to the proofs of the two main theorems, while Section 3 includes the proofs of supporting lemmas; Section 4 includes additional figures from the numerical study. Under Assumptions 1.1 to 1.3, when m > C for some constant C > 0, we have the following results under two corresponding conditions on sl(m): First we present the following lemma, showing that the loss function has a property similar from strong convexity. For the first case discussed in Lemma 2.1, define eg(θ(k)) = (g(θ(k)))2, and for the second case define eg(θ(k)) = g(θ(k)). Therefore, combining Lemma 2.1, Lemma 2.2 and (7) leads to the following conclusion. Apply(15)inLemma 2.3 with = 12, then for any 0<α<1, with probability at least 1 2m α, we have A11 1 Under this case, we can still apply (15) in Lemma 2.3.

artificial intelligence, machine learning, probability, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Add feedback

1cb524b5a3f3f82be4a7d954063c07e2-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 16:55:25 GMT

dataset, gradient, theorem 3, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Michigan (0.04)
North America > Canada (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

OntheConvergenceTheoryofDebiased Model-AgnosticMeta-ReinforcementLearning

Neural Information Processing SystemsFeb-7-2026, 16:24:40 GMT

In particular, using stochastic gradients in MAML update steps is crucial for RL problems since computation of exact gradients requires access to a large number of possible trajectories.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

18085327b86002fc604c323b9a07f997-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 16:24:37 GMT

gradient, policy gradient, sg-mrl, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(2 more...)

Genre: Research Report (0.93)

Industry: Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Add feedback

max k [K] hik(x)>1 B/K1 min i [n ] min l [K] aill(x)/K. Step2. Weassumethathjr(x)attainsthelargestvalueofhik(x)foranyi [n],k [K]. Then hjr(x)>1 min

Neural Information Processing SystemsFeb-7-2026, 15:34:11 GMT

A.1 ImplementationDetails Network Architecture: Inspired by [33], we utilize a pre-trained ResNet-50 [20] as the feature extractor for object recognition tasks (i.e., Office-31 [22], Office-Caltech [18] and Office-Home [46]). Theoverallframeworkis trained under an end-to-end manner via back-propagation. The stochastic gradient descent with momentum value as 0.9 is employed as the network optimizer. The initial learning rates for feature extractor and bottleneck layer are respectively set as 10 3 and 10 2, while the parameters of classifier are frozen. It is exponentially decayed as the training process.

artificial intelligence, machine learning, ptx, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback