relu nn
Pathwise Explanation of ReLU Neural Networks
Lim, Seongwoo, Jo, Won, Lee, Joohyung, Choi, Jaesik
Neural networks have demonstrated a wide range of successes, but their ``black box" nature raises concerns about transparency and reliability. Previous research on ReLU networks has sought to unwrap these networks into linear models based on activation states of all hidden units. In this paper, we introduce a novel approach that considers subsets of the hidden units involved in the decision making path. This pathwise explanation provides a clearer and more consistent understanding of the relationship between the input and the decision-making process. Our method also offers flexibility in adjusting the range of explanations within the input, i.e., from an overall attribution input to particular components within the input. Furthermore, it allows for the decomposition of explanations for a given input for more detailed explanations. Experiments demonstrate that our method outperforms others both quantitatively and qualitatively.
Extracting Forward Invariant Sets from Neural Network-Based Control Barrier Functions
Vaisi, Goli, Ferlez, James, Shoukry, Yasser
Training Neural Networks (NNs) to serve as Barrier Functions (BFs) is a popular way to improve the safety of autonomous dynamical systems. Despite significant practical success, these methods are not generally guaranteed to produce true BFs in a provable sense, which undermines their intended use as safety certificates. In this paper, we consider the problem of formally certifying a learned NN as a BF with respect to state avoidance for an autonomous system: viz. computing a region of the state space on which the candidate NN is provably a BF. In particular, we propose a sound algorithm that efficiently produces such a certificate set for a shallow NN. Our algorithm combines two novel approaches: it first uses NN reachability tools to identify a subset of states for which the output of the NN does not increase along system trajectories; then, it uses a novel enumeration algorithm for hyperplane arrangements to find the intersection of the NN's zero-sub-level set with the first set of states. In this way, our algorithm soundly finds a subset of states on which the NN is certified as a BF. We further demonstrate the effectiveness of our algorithm at certifying for real-world NNs as BFs in two case studies. We complemented these with scalability experiments that demonstrate the efficiency of our algorithm.
Deep ReLU networks -- injectivity capacity upper bounds
We study deep ReLU feed forward neural networks (NN) and their injectivity abilities. The main focus is on \emph{precisely} determining the so-called injectivity capacity. For any given hidden layers architecture, it is defined as the minimal ratio between number of network's outputs and inputs which ensures unique recoverability of the input from a realizable output. A strong recent progress in precisely studying single ReLU layer injectivity properties is here moved to a deep network level. In particular, we develop a program that connects deep $l$-layer net injectivity to an $l$-extension of the $\ell_0$ spherical perceptrons, thereby massively generalizing an isomorphism between studying single layer injectivity and the capacity of the so-called (1-extension) $\ell_0$ spherical perceptrons discussed in [82]. \emph{Random duality theory} (RDT) based machinery is then created and utilized to statistically handle properties of the extended $\ell_0$ spherical perceptrons and implicitly of the deep ReLU NNs. A sizeable set of numerical evaluations is conducted as well to put the entire RDT machinery in practical use. From these we observe a rapidly decreasing tendency in needed layers' expansions, i.e., we observe a rapid \emph{expansion saturation effect}. Only $4$ layers of depth are sufficient to closely approach level of no needed expansion -- a result that fairly closely resembles observations made in practical experiments and that has so far remained completely untouchable by any of the existing mathematical methodologies.
Equidistribution-based training of Free Knot Splines and ReLU Neural Networks
Appella, Simone, Arridge, Simon, Budd, Chris, Deveney, Teo, Kreusser, Lisa Maria
We consider the problem of one-dimensional function approximation using shallow neural networks (NN) with a rectified linear unit (ReLU) activation function and compare their training with traditional methods such as univariate Free Knot Splines (FKS). ReLU NNs and FKS span the same function space, and thus have the same theoretical expressivity. In the case of ReLU NNs, we show that their ill-conditioning degrades rapidly as the width of the network increases. This often leads to significantly poorer approximation in contrast to the FKS representation, which remains well-conditioned as the number of knots increases. We leverage the theory of optimal piecewise linear interpolants to improve the training procedure for a ReLU NN. Using the equidistribution principle, we propose a two-level procedure for training the FKS by first solving the nonlinear problem of finding the optimal knot locations of the interpolating FKS. Determining the optimal knots then acts as a good starting point for training the weights of the FKS. The training of the FKS gives insights into how we can train a ReLU NN effectively to give an equally accurate approximation. More precisely, we combine the training of the ReLU NN with an equidistribution based loss to find the breakpoints of the ReLU functions, combined with preconditioning the ReLU NN approximation (to take an FKS form) to find the scalings of the ReLU functions, leads to a well-conditioned and reliable method of finding an accurate ReLU NN approximation to a target function. We test this method on a series or regular, singular, and rapidly varying target functions and obtain good results realising the expressivity of the network in this case.
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes
Qiao, Dan, Zhang, Kaiqi, Singh, Esha, Soudry, Daniel, Wang, Yu-Xiang
We study the generalization of two-layer ReLU neural networks in a univariate nonparametric regression problem with noisy labels. This is a problem where kernels (\emph{e.g.} NTK) are provably sub-optimal and benign overfitting does not happen, thus disqualifying existing theory for interpolating (0-loss, global optimal) solutions. We present a new theory of generalization for local minima that gradient descent with a constant learning rate can \emph{stably} converge to. We show that gradient descent with a fixed learning rate $\eta$ can only find local minima that represent smooth functions with a certain weighted \emph{first order total variation} bounded by $1/\eta - 1/2 + \widetilde{O}(\sigma + \sqrt{\mathrm{MSE}})$ where $\sigma$ is the label noise level, $\mathrm{MSE}$ is short for mean squared error against the ground truth, and $\widetilde{O}(\cdot)$ hides a logarithmic factor. Under mild assumptions, we also prove a nearly-optimal MSE bound of $\widetilde{O}(n^{-4/5})$ within the strict interior of the support of the $n$ data points. Our theoretical results are validated by extensive simulation that demonstrates large learning rate training induces sparse linear spline fits. To the best of our knowledge, we are the first to obtain generalization bound via minima stability in the non-interpolation case and the first to show ReLU NNs without regularization can achieve near-optimal rates in nonparametric regression.
Shallow ReLU neural networks and finite elements
We point out that (continuous or discontinuous) piecewise linear functions on a convex polytope mesh can be represented by two-hidden-layer ReLU neural networks in a weak sense. In addition, the numbers of neurons of the two hidden layers required to weakly represent are accurately given based on the numbers of polytopes and hyperplanes involved in this mesh. The results naturally hold for constant and linear finite element functions. Such weak representation establishes a bridge between shallow ReLU neural networks and finite element functions, and leads to a perspective for analyzing approximation capability of ReLU neural networks in $L^p$ norm via finite element functions. Moreover, we discuss the strict representation for tensor finite element functions via the recent tensor neural networks.
Neural Networks for Singular Perturbations
Opschoor, Joost A. A., Schwab, Christoph, Xenophontos, Christos
We prove deep neural network (DNN for short) expressivity rate bounds for solution sets of a model class of singularly perturbed, elliptic two-point boundary value problems, in Sobolev norms, on the bounded interval $(-1,1)$. We assume that the given source term and reaction coefficient are analytic in $[-1,1]$. We establish expression rate bounds in Sobolev norms in terms of the NN size which are uniform with respect to the singular perturbation parameter for several classes of DNN architectures. In particular, ReLU NNs, spiking NNs, and $\tanh$- and sigmoid-activated NNs. The latter activations can represent ``exponential boundary layer solution features'' explicitly, in the last hidden layer of the DNN, i.e. in a shallow subnetwork, and afford improved robust expression rate bounds in terms of the NN size. We prove that all DNN architectures allow robust exponential solution expression in so-called `energy' as well as in `balanced' Sobolev norms, for analytic input data.
Null Space Properties of Neural Networks with Applications to Image Steganography
Neural networks are powerful learning methods in use for various tasks today. This is especially true in the domain of image recognition, where neural networks can achieve even human-competitive results[13]. However, a number of studies have revealed that neural networks for image classification can be easily influenced to misclassify by modifying images[1]. In 2014, Szegedy et al. first discovered an intriguing weakness of deep neural networks[15]. They showed that neural networks for image classification can be easily fooled by small perturbations, and they called these intentionally modified images adversarial examples. Following this observation, numerous studies have been carried out to find different ways to generate adversarial examples[7, 11, 13]. The main idea is to find a subtle perturbation that can drastically change the output of a neural network by adding it to the data. It is observed that adversarial examples have good transferability across models, which suggests that the existence of adversarial examples is also a property of datasets[8], thus adversarial examples are not restricted only to the given model. In our study, we aim to find a model-based method to fool the neural networks.