AITopics | relu

Collaborating Authors

relu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Proof of Proposition 2.5

Neural Information Processing SystemsFeb-17-2026, 01:22:08 GMT

Proposition 2.5 is a direct consequence of the following lemma (remember that Lemma A.1 (Smooth functions conserved through a given flow.) . Assume that @h () ()=0 for all 2 . Let us first show the direct inclusion. Now let us show the converse inclusion. We recall (cf Example 2.10 and Example 2.11) that linear and Assumption 2.9, which we recall reads as: Theorem 2.14, let us show that (9) holds for standard ML losses.

artificial intelligence, conserved function, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Sampling weights of deep neural networks Erik Lien Bolager

Neural Information Processing SystemsFeb-17-2026, 01:03:35 GMT

We introduce a probability distribution, combined with an efficient sampling algorithm, for weights and biases of fully-connected neural networks.

artificial intelligence, machine learning, neural network, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)

Add feedback

Appendix 446 A Proof of Proposition 1 in Section 2 447 Proof

Neural Information Processing SystemsFeb-16-2026, 16:14:29 GMT

ReLU (T (v u) + b) = ReLU( Tv + b), where u = 0, that is, ReLU (T + b) is not injective. By injectivity of T, we finally get a = b . Remark 2. An example that satisfies (3.1) is the neural operator whose This construction is given by the combination of "Pairs of projections" discussed in Kato [2013, Section I.4.6] with the idea presented in [Puthawala et al., 2022b, Lemma 29]. R. We write operator null G by Thus, in both cases, H is injective. Remark 4. W e make the following observations using Theorem 1: Leaky ReLU is one of example that satisfies (ii) in Theorem 1. Puthawala et al. [2022a, Theorem 15] assumes that We first revisit layerwise injectivity and bijectivity in the case of the finite rank approximation.

artificial intelligence, machine learning, operator, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Globally injective and bijective neural operators

Neural Information Processing SystemsFeb-16-2026, 16:14:25 GMT

Neural operators [Kovachki et al., 2021a,b] are neural networks that take a

artificial intelligence, machine learning, operator, (18 more...)

Neural Information Processing Systems

Country:

Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > India > Tripura (0.04)
North America > United States > South Dakota (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

766f407b7b4a82135da23b32f0cbaff3-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 19:38:16 GMT

artificial intelligence, gaussian distribution, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

Neural Information Processing SystemsFeb-12-2026, 01:02:10 GMT

Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known algorithms for solving those tasks?

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Detailed comparisons with related work

Neural Information Processing SystemsFeb-8-2026, 03:16:40 GMT

In Table 1, we compare our agnostic learning results. Our results in this setting come from Theorem 3.3. We note that the sample complexity for Diakonikolas et al. To prove Lemma 3.5, we use the following result of Y ehudai and Shamir [35]. We first consider the case when σ satisfies Assumption 3.1.

artificial intelligence, machine learning, nullx null 2 2, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

Efficient and Minimax-optimal In-context Nonparametric Regression with Transformers

Ching, Michelle, Popescu, Ioana, Smith, Nico, Ma, Tianyi, Underwood, William G., Samworth, Richard J.

arXiv.org Machine LearningJan-22-2026

We study in-context learning for nonparametric regression with $α$-Hölder smooth regression functions, for some $α>0$. We prove that, with $n$ in-context examples and $d$-dimensional regression covariates, a pretrained transformer with $Θ(\log n)$ parameters and $Ω\bigl(n^{2α/(2α+d)}\log^3 n\bigr)$ pretraining sequences can achieve the minimax-optimal rate of convergence $O\bigl(n^{-2α/(2α+d)}\bigr)$ in mean squared error. Our result requires substantially fewer transformer parameters and pretraining sequences than previous results in the literature. This is achieved by showing that transformers are able to approximate local polynomial estimators efficiently by implementing a kernel-weighted polynomial basis and then running gradient descent.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2601.15014

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Japan > Honshū > Kansai > Wakayama Prefecture > Wakayama (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.60)

Add feedback

Invertibility of Convolutional Generative Networks from Partial Measurements

Neural Information Processing SystemsDec-26-2025, 03:12:31 GMT

In this work, we present new theoretical results on convolutional generative neural networks, in particular their invertibility (i.e., the recovery of input latent code given the network output). The study of network inversion problem is motivated by image inpainting and the mode collapse problem in training GAN. Network inversion is highly non-convex, and thus is typically computationally intractable and without optimality guarantees. However, we rigorously prove that, under some mild technical assumptions, the input of a two-layer convolutional generative network can be deduced from the network output efficiently using simple gradient descent. This new theoretical finding implies that the mapping from the low-dimensional latent space to the high-dimensional image space is bijective (i.e., one-to-one). In addition, the same conclusion holds even when the network output is only partially observed (i.e., with missing pixels). Our theorems hold for 2-layer convolutional generative network with ReLU as the activation function, but we demonstrate empirically that the same conclusion extends to multi-layer networks and networks with other activation functions, including the leaky ReLU, sigmoid and tanh.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.07)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.84)

Add feedback

Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks

Neural Information Processing SystemsDec-25-2025, 08:28:10 GMT

We study the sample complexity of learning one-hidden-layer convolutional neural networks (CNNs) with non-overlapping filters. We propose a novel algorithm called approximate gradient descent for training CNNs, and show that, with high probability, the proposed algorithm with random initialization grants a linear convergence to the ground-truth parameters up to statistical precision. Compared with existing work, our result applies to general non-trivial, monotonic and Lipschitz continuous activation functions including ReLU, Leaky ReLU, Sigmod and Softplus etc. Moreover, our sample complexity beats existing results in the dependency of the number of hidden nodes and filter size. In fact, our result matches the information-theoretic lower bound for learning one-hidden-layer CNNs with linear activation functions, suggesting that our sample complexity is tight. Our theoretical analysis is backed up by numerical experiments.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback