AITopics | leaky-relu

Collaborating Authors

leaky-relu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Recently, there has been a growing focus on determining the minimum width requirements for achieving the universal approximation property in deep, narrow Multi-Layer Perceptrons (MLPs). Among these challenges, one particularly challenging task is approximating a continuous function under the uniform norm, as indicated by the significant disparity between its lower and upper bounds. To address this problem, we propose a framework that simplifies finding the minimum width for deep, narrow MLPs into determining a purely geometrical function denoted as $w(d_x, d_y)$. This function relies solely on the input and output dimensions, represented as $d_x$ and $d_y$, respectively. Two key steps support this framework. First, we demonstrate that deep, narrow MLPs, when provided with a small additional width, can approximate a $C^2$-diffeomorphism. Subsequently, using this result, we prove that $w(d_x, d_y)$ equates to the optimal minimum width required for deep, narrow MLPs to achieve universality. By employing the aforementioned framework and the Whitney embedding theorem, we provide an upper bound for the minimum width, given by $\operatorname{max}(2d_x+1, d_y) + \alpha(\sigma)$, where $0 \leq \alpha(\sigma) \leq 2$ represents a constant depending on the activation function. Furthermore, we provide a lower bound of $4$ for the minimum width in cases where the input and output dimensions are both equal to two.

activation function, continuous function, narrow mlp, (15 more...)

arXiv.org Artificial Intelligence

2308.15873

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

Minimum width for universal approximation using ReLU networks on compact domain

Kim, Namjun, Min, Chanho, Park, Sejun

arXiv.org Machine LearningSep-19-2023

Understanding what neural networks can or cannot do is a fundamental problem in the expressive power of neural networks. Initial approaches for this problem mostly focus on depth-bounded networks. For example, a line of research studies the size of the two-layer neural network to memorize (i.e., perfectly fit) an arbitrary training dataset and shows that the number of parameters proportional to the dataset size is necessary and sufficient for various activation functions (Baum, 1988; Huang and Babri, 1998). Another important line of works investigates a class of functions that can be approximated by two-layer networks. Classical results in this field represented by the universal approximation theorem show that two-layer networks using a non-polynomial activation function are dense in the space of continuous functions on compact domains (Hornik et al., 1989; Cybenko, 1989; Leshno et al., 1993; Pinkus, 1999). Along with the success of deep learning, the expressive power of deep neural networks has been studied. As in the classical depth-bounded network results, several works have shown that deep neural networks with bounded width can memorize arbitrary training dataset (Yun et al., 2019; Vershynin, 2020) and can approximate any continuous function (Lu et al., 2017; Hanin and Sellke, 2017). Intriguingly, it has also been shown that deeper networks can be more expressive compared to shallow ones. For example, Telgarsky (2016); Eldan and Shamir (2016); Daniely (2017) show that there is a class of functions that can be approximated by deep width-bounded networks with a small number of parameters but cannot be approximated by shallow networks without extremely large width.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

2309.10402

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

APTx: better activation function than MISH, SWISH, and ReLU's variants used in deep learning

Kumar, Ravin

arXiv.org Artificial IntelligenceMar-10-2023

Activation Functions introduce non-linearity in the deep neural networks. This nonlinearity helps the neural networks learn faster and efficiently from the dataset. In deep learning, many activation functions are developed and used based on the type of problem statement. ReLU's variants, SWISH, and MISH are goto activation functions. MISH function is considered having similar or even better performance than SWISH, and much better than ReLU. In this paper, we propose an activation function named APTx which behaves similar to MISH, but requires lesser mathematical operations to compute. The lesser computational requirements of APTx does speed up the model training, and thus also reduces the hardware requirement for the deep learning model.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.51483/IJAIML.2.2.2022.56-61

2209.06119

Country:

Asia > Singapore (0.05)
Asia > India > Uttar Pradesh (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Empirical study of the modulus as activation function in computer vision applications

Vallés-Pérez, Iván, Soria-Olivas, Emilio, Martínez-Sober, Marcelino, Serrano-López, Antonio J., Vila-Francés, Joan, Gómez-Sanchís, Juan

arXiv.org Artificial IntelligenceJan-14-2023

In this work we propose a new non-monotonic activation function: the modulus. The majority of the reported research on nonlinearities is focused on monotonic functions. We empirically demonstrate how by using the modulus activation function on computer vision tasks the models generalize better than with other nonlinearities - up to a 15% accuracy increase in CIFAR100 and 4% in CIFAR10, relative to the best of the benchmark activations tested. With the proposed activation function the vanishing gradient and dying neurons problems disappear, because the derivative of the activation function is always 1 or -1. The simplicity of the proposed function and its derivative make this solution specially suitable for TinyML and hardware applications.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2301.05993

Country: