AITopics | elu

We consider realizable contextual bandits with general function approximation, investigating how small reward variance can lead to better-than-minimax regret bounds. Unlike in minimax regret bounds, we show that the eluder dimension $d_{\text{elu}}$$-$a measure of the complexity of the function class$-$plays a crucial role in variance-dependent bounds. We consider two types of adversary: (1) Weak adversary: The adversary sets the reward variance before observing the learner's action. In this setting, we prove that a regret of $\Omega( \sqrt{ \min (A, d_{\text{elu}}) \Lambda } + d_{\text{elu}})$ is unavoidable when $d_{\text{elu}} \leq \sqrt{A T}$, where $A$ is the number of actions, $T$ is the total number of rounds, and $\Lambda$ is the total variance over $T$ rounds. For the $A\leq d_{\text{elu}}$ regime, we derive a nearly matching upper bound $\tilde{O}( \sqrt{ A\Lambda } + d_{\text{elu} })$ for the special case where the variance is revealed at the beginning of each round.

artificial intelligence, elu, machine learning, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.80)
Information Technology > Artificial Intelligence > Machine Learning (0.62)

Add feedback

9861a7c3972ed5d36dda3826d44bb246-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 10:39:16 GMT

algorithm, elu, probability, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Compositional Generalization from First Principles

Neural Information Processing SystemsOct-8-2025, 04:39:36 GMT

It's been a long-standing idea to leverage the compositional nature of the world for learning.

artificial intelligence, machine learning, sprite, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Robust Deep Network Learning of Nonlinear Regression Tasks by Parametric Leaky Exponential Linear Units (LELUs) and a Diffusion Metric

Bigarella, Enda D. V.

arXiv.org Artificial IntelligenceOct-3-2025

This document proposes a parametric activation function (ac.f.) aimed at improving multidimensional nonlinear data regression. It is a established knowledge that nonlinear ac.f's are required for learning nonlinear datasets. This work shows that smoothness and gradient properties of the ac.f. further impact the performance of large neural networks in terms of overfitting and sensitivity to model parameters. Smooth but vanishing-gradient ac.f's such as ELU or SiLU (Swish) have limited performance and non-smooth ac.f's such as RELU and Leaky-RELU further impart discontinuity in the trained model. Improved performance is demonstrated with a smooth "Leaky Exponential Linear Unit", with non-zero gradient that can be trained. A novel diffusion-loss metric is also proposed to gauge the performance of the trained models in terms of overfitting.

artificial intelligence, leaky elu, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.ins.2025.122739

2507.06765

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Italy > Sardinia (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.41)

Add feedback

How Does Variance Shape the Regret in Contextual Bandits?

Neural Information Processing SystemsMay-27-2025, 09:57:21 GMT

We consider realizable contextual bandits with general function approximation, investigating how small reward variance can lead to better-than-minimax regret bounds. Unlike in minimax regret bounds, we show that the eluder dimension d_{\text{elu}} - a measure of the complexity of the function class - plays a crucial role in variance-dependent bounds. We consider two types of adversary: (1) Weak adversary: The adversary sets the reward variance before observing the learner's action. In this setting, we prove that a regret of \Omega( \sqrt{ \min (A, d_{\text{elu}}) \Lambda } d_{\text{elu}}) is unavoidable when d_{\text{elu}} \leq \sqrt{A T}, where A is the number of actions, T is the total number of rounds, and \Lambda is the total variance over T rounds. For the A\leq d_{\text{elu}} regime, we derive a nearly matching upper bound \tilde{O}( \sqrt{ A\Lambda } d_{\text{elu} }) for the special case where the variance is revealed at the beginning of each round.

elu, lambda, sqrt, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.82)
Information Technology > Artificial Intelligence > Machine Learning (0.65)

Add feedback

How Does Variance Shape the Regret in Contextual Bandits?

Jia, Zeyu, Qian, Jian, Rakhlin, Alexander, Wei, Chen-Yu

arXiv.org Machine LearningNov-27-2024

We consider realizable contextual bandits with general function approximation, investigating how small reward variance can lead to better-than-minimax regret bounds. Unlike in minimax bounds, we show that the eluder dimension $d_\text{elu}$$-$a complexity measure of the function class$-$plays a crucial role in variance-dependent bounds. We consider two types of adversary: (1) Weak adversary: The adversary sets the reward variance before observing the learner's action. In this setting, we prove that a regret of $\Omega(\sqrt{\min\{A,d_\text{elu}\}\Lambda}+d_\text{elu})$ is unavoidable when $d_{\text{elu}}\leq\sqrt{AT}$, where $A$ is the number of actions, $T$ is the total number of rounds, and $\Lambda$ is the total variance over $T$ rounds. For the $A\leq d_\text{elu}$ regime, we derive a nearly matching upper bound $\tilde{O}(\sqrt{A\Lambda}+d_\text{elu})$ for the special case where the variance is revealed at the beginning of each round. (2) Strong adversary: The adversary sets the reward variance after observing the learner's action. We show that a regret of $\Omega(\sqrt{d_\text{elu}\Lambda}+d_\text{elu})$ is unavoidable when $\sqrt{d_\text{elu}\Lambda}+d_\text{elu}\leq\sqrt{AT}$. In this setting, we provide an upper bound of order $\tilde{O}(d_\text{elu}\sqrt{\Lambda}+d_\text{elu})$. Furthermore, we examine the setting where the function class additionally provides distributional information of the reward, as studied by Wang et al. (2024). We demonstrate that the regret bound $\tilde{O}(\sqrt{d_\text{elu}\Lambda}+d_\text{elu})$ established in their work is unimprovable when $\sqrt{d_{\text{elu}}\Lambda}+d_\text{elu}\leq\sqrt{AT}$. However, with a slightly different definition of the total variance and with the assumption that the reward follows a Gaussian distribution, one can achieve a regret of $\tilde{O}(\sqrt{A\Lambda}+d_\text{elu})$.

algorithm, elu, probability, (15 more...)

arXiv.org Machine Learning

2410.12713

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.34)

Add feedback

Adaptive Activation Functions for Predictive Modeling with Sparse Experimental Data

Pourkamali-Anaraki, Farhad, Nasrin, Tahamina, Jensen, Robert E., Peterson, Amy M., Hansen, Christopher J.

arXiv.org Artificial IntelligenceFeb-7-2024

A pivotal aspect in the design of neural networks lies in selecting activation functions, crucial for introducing nonlinear structures that capture intricate input-output patterns. While the effectiveness of adaptive or trainable activation functions has been studied in domains with ample data, like image classification problems, significant gaps persist in understanding their influence on classification accuracy and predictive uncertainty in settings characterized by limited data availability. This research aims to address these gaps by investigating the use of two types of adaptive activation functions. These functions incorporate shared and individual trainable parameters per hidden layer and are examined in three testbeds derived from additive manufacturing problems containing fewer than one hundred training instances. Our investigation reveals that adaptive activation functions, such as Exponential Linear Unit (ELU) and Softplus, with individual trainable parameters, result in accurate and confident prediction models that outperform fixed-shape activation functions and the less flexible method of using identical trainable activation functions in a hidden layer. Therefore, this work presents an elegant way of facilitating the design of adaptive neural networks in scientific and engineering problems.

activation function, adaptive activation function, prediction, (14 more...)

arXiv.org Artificial Intelligence

2402.05401

Country:

North America > United States > Massachusetts > Middlesex County > Lowell (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Materials > Chemicals > Commodity Chemicals (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)

Add feedback

Comparison of non-linear activation functions for deep neural networks on MNIST classification task

Pedamonti, Dabal

arXiv.org Machine LearningApr-8-2018

Activation functions play a key role in neural networks so it becomes fundamental to understand their advantages and disadvantages in order to achieve better performances. This paper will first introduce common types of non linear activation functions that are alternative to the well known sigmoid function and then evaluate their characteristics. Moreover deeper neural networks will be analysed because they positively influence the final performances compared to shallower networks. They also strictly depend on the weight initialisation hence the effect of drawing weights from Gaussian and uniform distribution will be analysed making particular attention on how the number of incoming and outgoing connection to a node influence the whole network.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Machine Learning

1804.02763

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Collaborating Authors

elu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

9861a7c3972ed5d36dda3826d44bb246-Paper-Conference.pdf

15f6a10899f557ce53fe39939af6f930-Paper-Conference.pdf

How Does Variance Shape the Regret in Contextual Bandits?

9861a7c3972ed5d36dda3826d44bb246-Paper-Conference.pdf

Compositional Generalization from First Principles

Robust Deep Network Learning of Nonlinear Regression Tasks by Parametric Leaky Exponential Linear Units (LELUs) and a Diffusion Metric

How Does Variance Shape the Regret in Contextual Bandits?

How Does Variance Shape the Regret in Contextual Bandits?

Adaptive Activation Functions for Predictive Modeling with Sparse Experimental Data

Comparison of non-linear activation functions for deep neural networks on MNIST classification task