lenet-5
On the Adversarial Robustness of Spiking Neural Networks Trained by Local Learning
Lin, Jiaqi, Sengupta, Abhronil
Recent research has shown the vulnerability of Spiking Neural Networks (SNNs) under adversarial examples that are nearly indistinguishable from clean data in the context of frame-based and event-based information. The majority of these studies are constrained in generating adversarial examples using Backpropagation Through Time (BPTT), a gradient-based method which lacks biological plausibility. In contrast, local learning methods, which relax many of BPTT's constraints, remain under-explored in the context of adversarial attacks. To address this problem, we examine adversarial robustness in SNNs through the framework of four types of training algorithms. We provide an in-depth analysis of the ineffectiveness of gradient-based adversarial attacks to generate adversarial instances in this scenario. To overcome these limitations, we introduce a hybrid adversarial attack paradigm that leverages the transferability of adversarial instances. The proposed hybrid approach demonstrates superior performance, outperforming existing adversarial attack methods. Furthermore, the generalizability of the method is assessed under multi-step adversarial attacks, adversarial attacks in black-box FGSM scenarios, and within the non-spiking domain.
- Information Technology > Security & Privacy (1.00)
- Government > Military (1.00)
On Hardening DNNs against Noisy Computations
Wang, Xiao, Borras, Hendrik, Klein, Bernhard, Fröning, Holger
The success of deep learning has sparked significant interest in designing computer hardware optimized for the high computational demands of neural network inference. As further miniaturization of digital CMOS processors becomes increasingly challenging, alternative computing paradigms, such as analog computing, are gaining consideration. Particularly for compute-intensive tasks such as matrix multiplication, analog computing presents a promising alternative due to its potential for significantly higher energy efficiency compared to conventional digital technology. However, analog computations are inherently noisy, which makes it challenging to maintain high accuracy on deep neural networks. This work investigates the effectiveness of training neural networks with quantization to increase the robustness against noise. Experimental results across various network architectures show that quantization-aware training with constant scaling factors enhances robustness. We compare these methods with noisy training, which incorporates a noise injection during training that mimics the noise encountered during inference. While both two methods increase tolerance against noise, noisy training emerges as the superior approach for achieving robust neural network performance, especially in complex neural architectures.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Hawaii (0.04)
- Europe > Germany (0.04)
ElasticZO: A Memory-Efficient On-Device Learning with Combined Zeroth- and First-Order Optimization
Sugiura, Keisuke, Matsutani, Hiroki
First-order (FO) optimization algorithms with backpropagation (BP) [1, 2, 3, 4, 5] have been predominantly used for training deep neural networks (DNNs) thanks to the wide support in popular DL frameworks. While BP provides a systematic way to compute FO gradients via chain-rule by traversing the computational graph, it needs to save intermediate activations as well as gradients (with respect to parameters), which incurs considerably higher memory requirements than inference [6] and may pose challenges for deployment on the memory-constrained platforms (e.g., Raspberry Pi Zero). Besides, advanced FO optimizers consume extra memory to store optimizer states such as momentum (running average of past gradients) and a copy of the trainable parameters. Given this situation, in the recent literature, zeroth-order (ZO) optimization has seen a resurgence of interest as a simple yet powerful alternative to FO methods [7, 8]. One notable feature of ZO methods is that it only requires two forward passes per input during training. Since ZO gradients can be obtained from DNN outputs (loss values), ZO-based approach becomes an attractive choice when FO gradients are infeasible to obtain or not available (e.g., non-differentiable loss functions). It has been applied to a wide range of practical applications including black-box adversarial attacks [9, 10, 11] (where attackers only have an access to DNN inputs and outputs), black-box defense [12, 13], neural architecture search [14, 15], sensor selection in wireless networks [16], coverage maximization in cellular networks [17, 18], and reinforcement learning from human feedback [19, 20]. Since ZO methods bypass BP, they do not need to retain computational graphs as well as intermediate activations and gradients.
- Information Technology (1.00)
- Energy > Oil & Gas (1.00)
USEFUSE: Utile Stride for Enhanced Performance in Fused Layer Architecture of Deep Neural Networks
Ibrahim, Muhammad Sohail, Usman, Muhammad, Lee, Jeong-A
Deep neural network (DNN) is an artificial neural network comprised of several layers between input and output layers. They have been widely used in image recognition [1], semantic segmentation [2], medical imaging [3], bioinformatics [4], and signal processing [5] etc. A class of DNN is convolutional neural networks (CNNs) which play a pivotal role in many applications such as computer vision, recognition, object detection, etc. This has been made possible due to the advancements in high performance computing technologies and the availability of cutting-edge compute resources. The use of CNNs with many layers has enabled the swift progress in a number of diverse application domains. CNN designs, inspired by the behavior of optic nerves in human brain, perform data processing in multiple layers of neurons to achieve human brain-like performance in image recognition. This research was supported by Basic Science Research Program funded by the Ministry of Education through the National Research Foundation of Korea (NRF-2020R1I1A3063857). The EDA tool was supported by the IC Design Education Center (IDEC), Korea.
- Europe > Germany > Bavaria > Regensburg (0.04)
- Asia (0.04)
- Health & Medicine > Therapeutic Area (0.48)
- Information Technology > Software (0.34)
Methodology to Deploy CNN-Based Computer Vision Models on Immersive Wearable Devices
Convolutional Neural Network (CNN) models often lack the ability to incorporate human input, which can be addressed by Augmented Reality (AR) headsets. However, current AR headsets face limitations in processing power, which has prevented researchers from performing real-time, complex image recognition tasks using CNNs in AR headsets. This paper presents a method to deploy CNN models on AR headsets by training them on computers and transferring the optimized weight matrices to the headset. The approach transforms the image data and CNN layers into a one-dimensional format suitable for the AR platform. We demonstrate this method by training the LeNet-5 CNN model on the MNIST dataset using PyTorch and deploying it on a HoloLens AR headset. The results show that the model maintains an accuracy of approximately 98%, similar to its performance on a computer. This integration of CNN and AR enables real-time image processing on AR headsets, allowing for the incorporation of human input into AI models.
- North America > United States > New Mexico (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > China > Hong Kong (0.04)
Enhancing Fault Resilience of QNNs by Selective Neuron Splitting
Ahmadilivani, Mohammad Hasan, Taheri, Mahdi, Raik, Jaan, Daneshtalab, Masoud, Jenihhin, Maksim
The superior performance of Deep Neural Networks (DNNs) has led to their application in various aspects of human life. Safety-critical applications are no exception and impose rigorous reliability requirements on DNNs. Quantized Neural Networks (QNNs) have emerged to tackle the complexity of DNN accelerators, however, they are more prone to reliability issues. In this paper, a recent analytical resilience assessment method is adapted for QNNs to identify critical neurons based on a Neuron Vulnerability Factor (NVF). Thereafter, a novel method for splitting the critical neurons is proposed that enables the design of a Lightweight Correction Unit (LCU) in the accelerator without redesigning its computational part. The method is validated by experiments on different QNNs and datasets. The results demonstrate that the proposed method for correcting the faults has a twice smaller overhead than a selective Triple Modular Redundancy (TMR) while achieving a similar level of fault resiliency.
- Europe > Estonia > Harju County > Tallinn (0.04)
- Oceania > Fiji (0.04)
- Europe > Sweden > Västmanland County > Västerås (0.04)
Compressing neural network by tensor network with exponentially fewer variational parameters
Qing, Yong, Zhou, Peng-Fei, Li, Ke, Ran, Shi-Ju
Neural network (NN) designed for challenging machine learning tasks is in general a highly nonlinear mapping that contains massive variational parameters. High complexity of NN, if unbounded or unconstrained, might unpredictably cause severe issues including over-fitting, loss of generalization power, and unbearable cost of hardware. In this work, we propose a general compression scheme that significantly reduces the variational parameters of NN by encoding them to multi-layer tensor networks (TN's) that contain exponentially-fewer free parameters. Superior compression performance of our scheme is demonstrated on several widely-recognized NN's (FC-2, LeNet-5, and VGG-16) and datasets (MNIST and CIFAR-10), surpassing the state-of-the-art method based on shallow tensor networks. For instance, about 10 million parameters in the three convolutional layers of VGG-16 are compressed in TN's with just $632$ parameters, while the testing accuracy on CIFAR-10 is surprisingly improved from $81.14\%$ by the original NN to $84.36\%$ after compression. Our work suggests TN as an exceptionally efficient mathematical structure for representing the variational parameters of NN's, which superiorly exploits the compressibility than the simple multi-way arrays.
- Asia > China > Beijing > Beijing (0.05)
- Asia > China > Sichuan Province > Chengdu (0.04)
- North America > United States (0.04)
- (2 more...)
Improving Classification Neural Networks by using Absolute activation function (MNIST/LeNET-5 example)
The paper discusses the use of the Absolute activation function in classification neural networks. An examples are shown of using this activation function in simple and more complex problems. Using as a baseline LeNet-5 network for solving the MNIST problem, the efficiency of Absolute activation function is shown in comparison with the use of Tanh, ReLU and SeLU activations. It is shown that in deep networks Absolute activation does not cause vanishing and exploding gradients, and therefore Absolute activation can be used in both simple and deep neural networks. Due to high volatility of training networks with Absolute activation, a special modification of ADAM training algorithm is used, that estimates lower bound of accuracy at any test dataset using validation dataset analysis at each training epoch, and uses this value to stop/decrease learning rate, and re-initializes ADAM algorithm between these steps. It is shown that solving the MNIST problem with the LeNet-like architectures based on Absolute activation allows to significantly reduce the number of trained parameters in the neural network with improving the prediction accuracy.
Problem-dependent attention and effort in neural networks with applications to image resolution and model selection
This paper introduces two new ensemble-based methods to reduce the data and computation costs of image classification. They can be used with any set of classifiers and do not require additional training. In the first approach, data usage is reduced by only analyzing a full-sized image if the model has low confidence in classifying a low-resolution pixelated version. When applied on the best performing classifiers considered here, data usage is reduced by 61.2% on MNIST, 69.6% on KMNIST, 56.3% on FashionMNIST, 84.6% on SVHN, 40.6% on ImageNet, and 27.6% on ImageNet-V2, all with a less than 5% reduction in accuracy. However, for CIFAR-10, the pixelated data are not particularly informative, and the ensemble approach increases data usage while reducing accuracy. In the second approach, compute costs are reduced by only using a complex model if a simpler model has low confidence in its classification. Computation cost is reduced by 82.1% on MNIST, 47.6% on KMNIST, 72.3% on FashionMNIST, 86.9% on SVHN, 89.2% on ImageNet, and 81.5% on ImageNet-V2, all with a less than 5% reduction in accuracy; for CIFAR-10 the corresponding improvements are smaller at 13.5%. When cost is not an object, choosing the projection from the most confident model for each observation increases validation accuracy to 81.0% from 79.3% for ImageNet and to 69.4% from 67.5% for ImageNet-V2.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)