AITopics | chain rule

Collaborating Authors

chain rule

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

58b7483ba899e0ce4d97ac5eecf6fa99-Supplemental.pdf

Neural Information Processing SystemsApr-26-2026, 01:09:41 GMT

artificial intelligence, machine learning, sequence, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.45)

Add feedback

Optimistic Rates for Multi-Task Representation Learning

Neural Information Processing SystemsApr-24-2026, 10:09:31 GMT

We study the problem of transfer learning via Multi-Task Representation Learning (MTRL), wherein multiple source tasks are used to learn a good common representation, and a predictor is trained on top of it for the target task. Under standard regularity assumptions on the loss function and task diversity, we provide new statistical rates on the excess risk of the target task, which demonstrate the benefit of representation learning. Importantly, our rates are optimistic, i.e., they interpolate between the standard O(m 1/2)rate and the fast O(m 1)rate, depending on the difficulty of the learning task, where m is the number of samples for the target task. Besides the main result, we make several new contributions, including giving optimistic rates for excess risk of source tasks (Multi-Task Learning (MTL)), a local Rademacher complexity theorem for MTRL and MTL, as well as a chain rule for local Rademacher complexity for composite predictor classes.

artificial intelligence, complexity, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Optimistic Rates for Multi-Task Representation Learning

Neural Information Processing SystemsApr-24-2026, 10:09:27 GMT

artificial intelligence, complexity, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Provably Correct Automatic Sub-Differentiation for Qualified Programs

Sham M. Kakade, Jason D. Lee

Neural Information Processing SystemsMar-14-2026, 13:52:35 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, nonsmooth function, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AT echnical Proofs Proof of Proposition 4.1.. Using the chain rule, (1), and the definitions of null

Neural Information Processing SystemsFeb-16-2026, 07:26:04 GMT

This appendix presents the technical details of efficiently implementing Algorithm 2. B.1 Computing Intermediate Quantities We argue that in the setting of neural networks, Algorithm 2 can obtain the intermediate quantities ζ Algorithm 3 gives a subroutine for computing the necessary scalars used in the efficient squared norm function of the embedding layer.Algorithm 3 Computing the Nonzero V alues of n In the former case, it is straightforward to see that we incur a compute (resp. F .1 Effect of Batch Size on Fully-Connected Layers Figure 4 presents numerical results for the same set of experiments as in Subsection 5.1 but for different batch sizes |B | instead of the output dimension q . Similar to Subsection 5.1, the results in Figure 4 are more favorable towards Adjoint compared to GhostClip.

artificial intelligence, machine learning, resp, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Add feedback

Appendices A Some Useful Lemmas

Neural Information Processing SystemsFeb-16-2026, 02:50:34 GMT

In this paper, there are some equivalent forms of the generalization error we will study, e.g., Eq. (2) This lemma is a consequence of Lemma 2.1, with further utilizing some symmetric properties. Recall Eq. (1) in Lemma 2.1, E Note that Eq. (2) in the main text is from the second equation above, which is used to derive individual Notice that we do not change the definitions of any the random variable, e.g., This, as we have already seen in Eq. (5) in the main text, is used to derive hypotheses-conditioned CMI bounds in Section 4. It's easy to see that when To obtain Eq. (14), we let W This is used to derive supersample-conditioned CMI bounds in Section 4. It's easy to see that both Like all the previous information-theoretic bounds, the following lemma is widely used in our paper. We also invoke some other lemmas as given below. It's easy to verify that We note that the reason we introduce four types of SCH stability in Definition 2.1 is that solely using The basic set up is as follows. By Lemma A.3, we have E Recall Eq. (12) in Lemma A.1 and applying Jensen's inequality to the absolute function, the first The proof is nearly the same to the proof of Theorem 3.1, except that now the randomness of the algorithm is given for each DV auxiliary function, so the randomness of Similar to the proof of Theorem 3.1, we let We now prove the first bound. Lemma A.2, we have E By Lemma A.3, we have E Recall Eq. (14) in Lemma A.1 and by Jensen's inequality for the absolute function, the first bound is To prove the second bound, we return to Eq. (20), and take expectation over For the second part of Theorem 4.1, notice that it's valid to let The proof is similar to [18, Theorem 2.1].

artificial intelligence, inequality, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Provably Correct Automatic Sub-Differentiation for Qualified Programs

Sham M. Kakade, Jason D. Lee

Neural Information Processing SystemsFeb-12-2026, 07:15:52 GMT

Neural Information Processing Systems http://nips.cc/

differentiation, library, nonsmooth function, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Unified Discretization Framework for Differential Equation Approach with Lyapunov Arguments for Convex Optimization

Neural Information Processing SystemsFeb-11-2026, 20:48:03 GMT

The differential equation (DE) approach for convex optimization, which relates optimization methods to specific continuous DEs with rate-revealing Lyapunov functionals, has gained increasing interest since the seminal paper by Su-Boyd-Candès (2014).

artificial intelligence, optimization method, optimization problem, (16 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

ec1f764517b7ffb52057af6df18142b7-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 18:17:46 GMT

Theeventsintasksisdenoted as Hs = Hs,n+1 and all the events upto and including stages is denoted asH1:s = ss0=1Hs0. Let X denote that the proportionality constant isindependentofX (possiblyaset). Using the relation between the history-independent and history-dependent entropy terms we obtain the second inequality. The right side term in the first inequality holds due to Cauchy-Schwarz. The inequality holds due to Weyl's inequality and This is true by using the upper bounds onσ2max(ˆΣs,t) in Lemma 3, and because the functionp x/log(1+ax)fora>0increaseswithx.

artificial intelligence, hs 1, information, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

7c40c5050bd029a3ea7ff8b01412f735-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 03:44:14 GMT

Additional notation For a matrix A Rd1 d2, A op is the operator norm (with respect to Euclidean norms), and A F istheFrobenius norm ofA. The main intuition behind the HMM considered in this paper comes from the correlation decay phenomenon ingraphicalmodel. Informally, we expect that there is one sign flip (i.e., Si = Si+1) per 1δ samples. To begin with the analysis of the estimator in Figure 2, the following lemma is a simple, yet key tool for the proof. It establishes the variance of the random gainS.

artificial intelligence, divergence, xn1, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.47)

Add feedback