Goto

Collaborating Authors

 nonlinear layer






Adaptive Sampling for Continuous Group Equivariant Neural Networks

Inal, Berfin, Cesa, Gabriele

arXiv.org Artificial Intelligence

Steerable networks, which process data with intrinsic symmetries, often use Fourier-based nonlinearities that require sampling from the entire group, leading to a need for discretization in continuous groups. As the number of samples increases, both performance and equivariance improve, yet this also leads to higher computational costs. To address this, we introduce an adaptive sampling approach that dynamically adjusts the sampling process to the symmetries in the data, reducing the number of required group samples and lowering the computational demands. We explore various implementations and their effects on model performance, equivariance, and computational efficiency. Our findings demonstrate improved model performance, and a marginal increase in memory efficiency.


Lie Neurons: Adjoint-Equivariant Neural Networks for Semisimple Lie Algebras

Lin, Tzu-Yuan, Zhu, Minghan, Ghaffari, Maani

arXiv.org Artificial Intelligence

This paper proposes an adjoint-equivariant neural network that takes Lie algebra data as input. Various types of equivariant neural networks have been proposed in the literature, which treat the input data as elements in a vector space carrying certain types of transformations. In comparison, we aim to process inputs that are transformations between vector spaces. The change of basis on transformation is described by conjugations, inducing the adjoint-equivariance relationship that our model is designed to capture. Leveraging the invariance property of the Killing form, the proposed network is a general framework that works for arbitrary semisimple Lie algebras. Our network possesses a simple structure that can be viewed as a Lie algebraic generalization of a multi-layer perceptron (MLP). This work extends the application of equivariant feature learning. Respecting the symmetry in data is essential for deep learning models to understand the underlying objects.


Specification-Driven Neural Network Reduction for Scalable Formal Verification

Ladner, Tobias, Althoff, Matthias

arXiv.org Artificial Intelligence

Formal verification of neural networks is essential before their deployment in safety-critical settings. However, existing methods for formally verifying neural networks are not yet scalable enough to handle practical problems that involve a large number of neurons. In this work, we propose a novel approach to address this challenge: A conservative neural network reduction approach that ensures that the verification of the reduced network implies the verification of the original network. Our approach constructs the reduction on-the-fly, while simultaneously verifying the original network and its specifications. The reduction merges all neurons of a nonlinear layer with similar outputs and is applicable to neural networks with any type of activation function such as ReLU, sigmoid, and tanh. Our evaluation shows that our approach can reduce a network to less than 5% of the number of neurons and thus to a similar degree the verification time is reduced.


Deep Learning Architectures

#artificialintelligence

There are several types of Deep learning architectures, also known as artificial neural networks of multiple nonlinear layers. Characteristics of input data and the objective of the research work helps one and individual to decide which Deep Learning architecture is to be used and when. Deep Neural Network DNN: -Various Deep Learning Architectures in DNN are designed on the basis of building blocks of Neural Networks. These building blocks are based on Multilayer Perceptron (MLP) which uses Perceptron's, Stacked Auto-Encoder (SAE) which uses Auto-Encoders, and Deep Belief Networks (DBNs) which use Restricted Boltzmann machines (RBMs). Convolution Neural Network CNN: CNN's architectures are consist of different layers like convolution layers, nonlinear layers, and pooling layers.


Adaptive deep density approximation for Fokker-Planck equations

Tang, Kejun, Wan, Xiaoliang, Liao, Qifeng

arXiv.org Machine Learning

In this paper we present a novel adaptive deep density approximation strategy based on KRnet (ADDA-KR) for solving the steady-state Fokker-Planck equation. It is known that this equation typically has high-dimensional spatial variables posed on unbounded domains, which limit the application of traditional grid based numerical methods. With the Knothe-Rosenblatt rearrangement, our newly proposed flow-based generative model, called KRnet, provides a family of probability density functions to serve as effective solution candidates of the Fokker-Planck equation, which have weaker dependence on dimensionality than traditional computational approaches. To result in effective stochastic collocation points for training KRnet, we develop an adaptive sampling procedure, where samples are generated iteratively using KRnet at each iteration. In addition, we give a detailed discussion of KRnet and show that it can efficiently estimate general high-dimensional density functions. We present a general mathematical framework of ADDA-KR, validate its accuracy and demonstrate its efficiency with numerical experiments.


Kernel-Based Smoothness Analysis of Residual Networks

Tirer, Tom, Bruna, Joan, Giryes, Raja

arXiv.org Machine Learning

A major factor in the success of deep neural networks is the use of sophisticated architectures rather than the classical multilayer perceptron (MLP). Residual networks (ResNets) stand out among these powerful modern architectures. Previous works focused on the optimization advantages of deep ResNets over deep MLPs. In this paper, we show another distinction between the two models, namely, a tendency of ResNets to promote smoother interpolations than MLPs. We analyze this phenomenon via the neural tangent kernel (NTK) approach. First, we compute the NTK for a considered ResNet model and prove its stability during gradient descent training. Then, we show by various evaluation methodologies that the NTK of ResNet, and its kernel regression results, are smoother than the ones of MLP. The better smoothness observed in our analysis may explain the better generalization ability of ResNets and the practice of moderately attenuating the residual blocks.