Goto

Collaborating Authors

 sigmoidal function


Schauder Bases for $C[0, 1]$ Using ReLU, Softplus and Two Sigmoidal Functions

Ganesh, Anand, Bose, Babhrubahan, Rajagopalan, Anand

arXiv.org Artificial Intelligence

We construct four Schauder bases for the space $C[0,1]$, one using ReLU functions, another using Softplus functions, and two more using sigmoidal versions of the ReLU and Softplus functions. This establishes the existence of a basis using these functions for the first time, and improves on the universal approximation property associated with them. We also show an $O(\frac{1}{n})$ approximation bound based on our ReLU basis, and a negative result on constructing multivariate functions using finite combinations of ReLU functions.


Convergence Analysis of Max-Min Exponential Neural Network Operators in Orlicz Space

Pradhan, Satyaranjan, Soren, Madan Mohan

arXiv.org Artificial Intelligence

In this current work, we propose a Max-Min approach for approximating functions using exponential neural network operators. We extend this framework to develop the Max-Min Kantorovich-type exponential neural network operators and investigate their approximation properties. We study both pointwise and uniform convergence for univariate functions. To analyze the order of convergence, we use the logarithmic modulus of continuity and estimate the corresponding rate of convergence. Furthermore, we examine the convergence behavior of the Max-Min Kantorovich-type exponential neural network operators within the Orlicz space setting. We provide some graphical representations to illustrate the approximation error of the function through suitable kernel and sigmoidal activation functions.


Parallel Layer Normalization for Universal Approximation

Ni, Yunhao, Liu, Yuhe, Sun, Wenxin, Tang, Yitong, Guo, Yuxin, Feng, Peilin, Wu, Wenjun, Huang, Lei

arXiv.org Machine Learning

Universal approximation theorem (UAT) is a fundamental theory for deep neural networks (DNNs), demonstrating their powerful representation capacity to represent and approximate any function. The analyses and proofs of UAT are based on traditional network with only linear and nonlinear activation functions, but omitting normalization layers, which are commonly employed to enhance the training of modern networks. This paper conducts research on UAT of DNNs with normalization layers for the first time. We theoretically prove that an infinitely wide network -- composed solely of parallel layer normalization (PLN) and linear layers -- has universal approximation capacity. Additionally, we investigate the minimum number of neurons required to approximate $L$-Lipchitz continuous functions, with a single hidden-layer network. We compare the approximation capacity of PLN with traditional activation functions in theory. Different from the traditional activation functions, we identify that PLN can act as both activation function and normalization in deep neural networks at the same time. We also find that PLN can improve the performance when replacing LN in transformer architectures, which reveals the potential of PLN used in neural architectures.


An elementary proof of a universal approximation theorem

Monico, Chris

arXiv.org Artificial Intelligence

There are several versions of universal approximation theorems known, including the very well-known ones from [1, 2, 3]. Each of them states that some collection of neural networks is dense in some space of continuous functions with respect to the uniform norm. In this short note, we present what we believe to be a new and atypically elementary proof of one such theorem. If σ is a 0-1 squashing function (a.k.a. a sigmoidal function), we show that the collection of neural networks with three hidden layers and activation function σ (except at the output) is dense in the space C(K) of real-valued continuous functions on a compact set K R


Morph-SSL: Self-Supervision with Longitudinal Morphing to Predict AMD Progression from OCT

Chakravarty, Arunava, Emre, Taha, Leingang, Oliver, Riedl, Sophie, Mai, Julia, Scholl, Hendrik P. N., Sivaprasad, Sobha, Rueckert, Daniel, Lotery, Andrew, Schmidt-Erfurth, Ursula, Bogunović, Hrvoje

arXiv.org Artificial Intelligence

The lack of reliable biomarkers makes predicting the conversion from intermediate to neovascular age-related macular degeneration (iAMD, nAMD) a challenging task. We develop a Deep Learning (DL) model to predict the future risk of conversion of an eye from iAMD to nAMD from its current OCT scan. Although eye clinics generate vast amounts of longitudinal OCT scans to monitor AMD progression, only a small subset can be manually labeled for supervised DL. To address this issue, we propose Morph-SSL, a novel Self-supervised Learning (SSL) method for longitudinal data. It uses pairs of unlabelled OCT scans from different visits and involves morphing the scan from the previous visit to the next. The Decoder predicts the transformation for morphing and ensures a smooth feature manifold that can generate intermediate scans between visits through linear interpolation. Next, the Morph-SSL trained features are input to a Classifier which is trained in a supervised manner to model the cumulative probability distribution of the time to conversion with a sigmoidal function. Morph-SSL was trained on unlabelled scans of 399 eyes (3570 visits). The Classifier was evaluated with a five-fold cross-validation on 2418 scans from 343 eyes with clinical labels of the conversion date. The Morph-SSL features achieved an AUC of 0.766 in predicting the conversion to nAMD within the next 6 months, outperforming the same network when trained end-to-end from scratch or pre-trained with popular SSL methods. Automated prediction of the future risk of nAMD onset can enable timely treatment and individualized AMD management.


Continuous approximation by convolutional neural networks with a sigmoidal function

Chang, Weike

arXiv.org Artificial Intelligence

In this paper we present a class of convolutional neural networks (CNNs) called non-overlapping CNNs in the study of approximation capabilities of CNNs. We prove that such networks with sigmoidal activation function are capable of approximating arbitrary continuous function defined on compact input sets with any desired degree of accuracy. This result extends existing results where only multilayer feedforward networks are a class of approximators. Evaluations elucidate the accuracy and efficiency of our result and indicate that the proposed non-overlapping CNNs are less sensitive to noise.



A Unified and Constructive Framework for the Universality of Neural Networks

Bui-Thanh, Tan

arXiv.org Machine Learning

One of the reasons why many neural networks are capable of replicating complicated tasks or functions is their universal property. Though the past few decades have seen tremendous advances in theories of neural networks, a single constructive framework for neural network universality remains unavailable. This paper is an effort to provide a unified and constructive framework for the universality of a large class of activations including most of existing ones. At the heart of the framework is the concept of neural network approximate identity (nAI). The main result is: {\em any nAI activation function is universal}. It turns out that most of existing activations are nAI, and thus universal in the space of continuous functions on compacta. The framework has the following main properties. First, it is constructive with elementary means from functional analysis, probability theory, and numerical analysis. Second, it is the first unified attempt that is valid for most of existing activations. Third, as a by product, the framework provides the first university proof for some of the existing activation functions including Mish, SiLU, ELU, GELU, and etc. Fourth, it provides new proofs for most activation functions. Fifth, it discovers new activations with guaranteed universality property. Sixth, for a given activation and error tolerance, the framework provides precisely the architecture of the corresponding one-hidden neural network with predetermined number of neurons, and the values of weights/biases. Seventh, the framework allows us to abstractly present the first universal approximation with favorable non-asymptotic rate.


Representation Theorem for Matrix Product States

Guo, Erdong, Draper, David

arXiv.org Machine Learning

In this work, we investigate the universal representation capacity of the Matrix Product States (MPS) from the perspective of boolean functions and continuous functions. We show that MPS can accurately realize arbitrary boolean functions by providing a construction method of the corresponding MPS structure for an arbitrarily given boolean gate. Moreover, we prove that the function space of MPS with the scale-invariant sigmoidal activation is dense in the space of continuous functions defined on a compact subspace of the $n$-dimensional real coordinate space $\mathbb{R^{n}}$. We study the relation between MPS and neural networks and show that the MPS with a scale-invariant sigmoidal function is equivalent to a one-hidden-layer neural network equipped with a kernel function. We construct the equivalent neural networks for several specific MPS models and show that non-linear kernels such as the polynomial kernel which introduces the couplings between different components of the input into the model appear naturally in the equivalent neural networks. At last, we discuss the realization of the Gaussian Process (GP) with infinitely wide MPS by studying their equivalent neural networks.


The Math Behind Logistic Regression

#artificialintelligence

Have you ever wondered how logistic regression works and how loss function is minimized by gradient descent? Have you ever wondered how logistic regression works and how loss function is minimized by gradient descent? This article is for you. Before starting with logistic regression, it is important to understand what is Supervised learning. Supervised learning is training the model on a dataset that contains a target(output) column.