AITopics | relu activation function

a922b7121007768f78f770c404415375-Paper-Conference.pdf

Neural Information Processing SystemsApr-27-2026, 01:20:26 GMT

abstraction, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England (0.28)
North America > United States > New York (0.28)

Genre:

Research Report (0.93)
Instructional Material > Course Syllabus & Notes (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Polyhedron Attention Module: Learning Adaptive-order Interactions Anonymous Author(s) Affiliation Address email Appendixes1

Neural Information Processing SystemsApr-25-2026, 15:31:13 GMT

Contents2 ADeriving Eq. 2. 23 BThe hyperplane set generated by the oblique tree is a superset of that created by the4 ReLU-activated plain DNN 35 CProof of Theorem 1 46 DProof of Theorem 2 57 EProof of Theorem 3 68 FProof of Theorem 4 79 GImplementation Detail 810 We consider a L-layer (L 2) ReLU activated plain DNN module f: Rn0 RnL with input12 x Rp. Eq. 2 in the main text can be30 obtained by rewriting P An oblique tree is a binary tree where each node splits the space by a hyperplane rather than by34 thresholding a single feature. The tree starts with the root of the full input space S, and by recursively35 splitting S, the tree grows deeper. For a D-depth (D 3) binary tree, there are 2D 1 1 internal36 nodes and 2D 1 leaf nodes. As shown in Figure 1, each internal and leaf node maintains a sub-space37 representing a polyhedron in S, and each layer of the tree corresponds to a partition of the input38 space into polyhedrons.

activation state, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

258be18e31c8188555c2ff05b4d542c3-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 04:02:03 GMT

activation function, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

Supplementary Material: Repulsive Deep Ensembles are Bayesian ANon-identifiable neural networks

Neural Information Processing SystemsApr-24-2026, 23:34:28 GMT

Deep neural networks are parametric models able to learn complex non-linear functions from few training instances and thus can be deployed to solve many tasks. Their overparameterized architecture, characterized by a number of parameters far larger than that of training data points, enables them to retain entire datasets even with random labels [84]. Even more, this overparameterized regime makes neural network approximations of a given function not unique in the sense that multiple configurations of weights might lead to the same function. Indeed, the output of a feed forward neural network given some fixed input remains unchanged under a set of transformations. For instance, certain weight permutations and sign flips in MLPs leave the output unchanged [9].

artificial intelligence, machine learning, neural network, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

05311655a15b75fab86956663e1819cd-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 11:29:39 GMT

artificial intelligence, convtranspose2d, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Add feedback

ecc9b6dfdbe374c0a3364ff81cd28642-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 19:41:00 GMT

artificial intelligence, dataset, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Oceania > New Zealand (0.04)
Oceania > Australia (0.04)
(7 more...)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

f23653913d8390cd4fc1bee8a3238e17-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 20:01:56 GMT

assumption, neural network, prediction error, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(2 more...)

Technology:

Information Technology > Modeling & Simulation (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Add feedback

How degenerate is the parametrization of neural networks with the ReLU activation function?

Dennis Maximilian Elbrächter, Julius Berner, Philipp Grohs

Neural Information Processing SystemsFeb-11-2026, 08:36:32 GMT

Neural Information Processing Systems http://nips.cc/

neural network, optimization problem, parametrization, (13 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.15)
North America > United States (0.14)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Polyhedron Attention Module: Learning Adaptive-order Interactions Anonymous Author(s) Affiliation Address email Appendixes

Neural Information Processing SystemsFeb-8-2026, 15:35:15 GMT

's leaf nodes to form Given the definition of our attention in Eq. 9 in the main text, the highest polynomial order is Before providing the proof of Theorem 4, we establish Lemma 1 as its foundation. We follow the principle of Y an et al's work [ Figure 1, we consider two kinds of value functions, i.e., In P AM-Net, we set the number of levels to 2. A grid search is performed over different configurations We conduct grid searches on the dropout rate over {0, 0.1, 0.2} and the initial

activation state, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Proof of Theorem 1 Proof

Neural Information Processing SystemsFeb-7-2026, 22:22:55 GMT

Theorem 6 is stated in terms of Gaussian complexity. Ben-David (2014) has a full proof. M (α)null is the linear class following the depth-K neural network. The second term relies on the Lipschitz constant of DNN, which we bound with the following lemma. Similar results are given by Scaman and Virmaux (2018); Fazlyab et al. (2019).

activation function, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Filters

Collaborating Authors

relu activation function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

a922b7121007768f78f770c404415375-Paper-Conference.pdf

Polyhedron Attention Module: Learning Adaptive-order Interactions Anonymous Author(s) Affiliation Address email Appendixes1

258be18e31c8188555c2ff05b4d542c3-Supplemental.pdf

Supplementary Material: Repulsive Deep Ensembles are Bayesian ANon-identifiable neural networks

05311655a15b75fab86956663e1819cd-Supplemental.pdf

ecc9b6dfdbe374c0a3364ff81cd28642-Supplemental-Conference.pdf

f23653913d8390cd4fc1bee8a3238e17-Paper-Conference.pdf

How degenerate is the parametrization of neural networks with the ReLU activation function?

Polyhedron Attention Module: Learning Adaptive-order Interactions Anonymous Author(s) Affiliation Address email Appendixes

A Proof of Theorem 1 Proof