AITopics | relu activation

Collaborating Authors

relu activation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Block Coordinate Descent for Neural Networks Provably Finds Global Minima

Neural Information Processing SystemsJun-14-2026, 06:40:36 GMT

In this paper, we consider a block coordinate descent (BCD) algorithm for training deep neural networks and provide a new global convergence guarantee under strictly monotonically increasing activation functions. While existing works demonstrate convergence to stationary points for BCD in neural networks, our contribution is the first to prove convergence to global minima, ensuring arbitrarily small loss. We show that the loss with respect to the output layer decreases exponentially while the loss with respect to the hidden layers remains well-controlled. Additionally, we derive generalization bounds using the Rademacher complexity framework, demonstrating that BCD not only achieves strong optimization guarantees but also provides favorable generalization performance. Moreover, we propose a modified BCD algorithm with skip connections and non-negative projection, extending our convergence guarantees to ReLU activation, which are not strictly monotonic. Empirical experiments confirm our theoretical findings, showing that the BCD algorithm achieves a small loss for strictly monotonic and ReLU activations.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

d79c1390baa2e4835586b094d82e5ffb-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 22:52:09 GMT

artificial intelligence, machine learning, neural-pi, (18 more...)

Neural Information Processing Systems

Industry: Energy > Power Industry (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

ATraining Regime

Neural Information Processing SystemsApr-25-2026, 23:24:17 GMT

A.1 Implementation of the GPs We use the GPyTorch4 package for the computations of GPs and their kernels. The NN linear kernel is implemented in all experiments as a 1-layer MLP with ReLU activations and hidden dimension 16. For the Spectral Mixture Kernel, we use 4 mixtures. A.2 Sines Dataset For the first experiments on sines functions, we use the dataset from [9]. For each task, the input points x are sampled from the range [ 5,5], and the target values y are obtained by applying y = Asin(x ')+, where the amplitude A and phase ' are drawn uniformly at random from ranges [0.1,5] and [0, ], respectively.

artificial intelligence, dataset, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

54b2b21af94108d83c2a909d5b0a6a50-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 23:22:27 GMT

Predicted sequences of a moving pendulum conditioned on z1 q(z1|x1:5,u1:4) or, in case of the EKVAE, on z1 p(z1|a1:5,u1:4), where the auxiliary variables are obtained through a1:5 q(a1:5|x1:5). The average prediction accuracy, measured by the MSE, can be found in Tab.

artificial intelligence, machine learning, relu activation, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)

Add feedback

Empowering Convolutional Neural Networks with MetaSin Activation

Neural Information Processing SystemsApr-25-2026, 17:32:41 GMT

RELU networks have remained the default choice for models in the area of image prediction despite their well-established spectral bias towards learning low frequencies faster, and consequently their difficulty of reproducing high frequency visual details. As an alternative, sinnetworks showed promising results in learning implicit representations of visual data. However training these networks in practically relevant settings proved to be difficult, requiring careful initialization, dealing with issues due to inconsistent gradients, and a degeneracy in local minima. In this work, we instead propose replacing a baseline network's existing activations with a novel ensemble function with trainable parameters. The proposed METASIN activation can be trained reliably without requiring intricate initialization schemes, and results in consistently lower test loss compared to alternatives. We demonstrate our method in the areas of Monte-Carlo denoising and image resampling where we set new state-of-the-art through a knowledge distillation based training procedure. We present ablations on hyper-parameter settings, comparisons with alternative activation function formulations, and discuss the use of our method in other domains, such as image classification.

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
North America > Canada > Ontario (0.28)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

204904e461002b28511d5880e1c36a0f-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 01:33:23 GMT

Similarly to [6], we consider that all environments have the same underlying Structural Causal Model (SCM) and that the different environments correspond to different interventions on the SCM. We provide here the formal definition for SCMs and interventions. We say that Xi causes Xj if Xi 2Pa(Xj). Definition A.2. (Intervention) [6]: Consider a SCMC =( S,N). An intervention e on C consists of replacing one or several of its structural equations to obtain an intervened SCMCe =( Se,N e) with structural equations: Sej: Xej fj(Pa(Xej),N ej), for j =1,...m (11) The variable Xe is intervened on if Si 6= Sei or Ni 6= Nei .

artificial intelligence, different environment, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.91)

Add feedback

0ff54b4ec4f70b3ae12c8621ca8a49f4-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 14:57:28 GMT

artificial intelligence, machine learning, relaxation, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Appendix AAnalysis of variance of uncertainty estimators

Neural Information Processing SystemsApr-24-2026, 12:33:36 GMT

We list the raw data sources used across all experiments in Table 17: the MNIST dataset (Creative Commons Attribution-Share Alike 3.0 license), the arithmetic expressions dataset from Kusner et al. [4], and the ZINC data (see also https://zinc.docking.org/)

artificial intelligence, estimator, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Experimental Setup

Neural Information Processing SystemsApr-24-2026, 11:07:34 GMT

We provide an extended version of the Experimental Setup from Section 5 below. Linear Model This domain involves learning a linear model when the underlying mapping between features and predictions is cubic. Concretely, the aim is to choose the top B =1 out of N = 50 resources using a linear model. The fact that the features can be seen as 1-dimensional allows us to visualize the learned models (as seen in Figure 4). Predict: Given a feature xn U[0,1], use a linear model to predict the utility ˆyof choosing resource n, where the true utility is given by yn = 10x3n 6.5xn.

artificial intelligence, machine learning, prediction, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Filters

Collaborating Authors

relu activation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Block Coordinate Descent for Neural Networks Provably Finds Global Minima

d79c1390baa2e4835586b094d82e5ffb-Supplemental-Conference.pdf

ATraining Regime

54b2b21af94108d83c2a909d5b0a6a50-Supplemental.pdf

4c07fe24771249c343e70c32289c1192-Supplemental.pdf

Empowering Convolutional Neural Networks with MetaSin Activation

204904e461002b28511d5880e1c36a0f-Supplemental.pdf

0ff54b4ec4f70b3ae12c8621ca8a49f4-Supplemental-Conference.pdf

Appendix AAnalysis of variance of uncertainty estimators

Experimental Setup