Goto

Collaborating Authors

 dropconnect


Dynamic DropConnect: Enhancing Neural Network Robustness through Adaptive Edge Dropping Strategies

Yang, Yuan-Chih, Chen, Hung-Hsuan

arXiv.org Artificial Intelligence

Dropout and DropConnect are well-known techniques that apply a consistent drop rate to randomly deactivate neurons or edges in a neural network layer during training. This paper introduces a novel methodology that assigns dynamic drop rates to each edge within a layer, uniquely tailoring the dropping process without incorporating additional learning parameters. We perform experiments on synthetic and openly available datasets to validate the effectiveness of our approach. The results demonstrate that our method outperforms Dropout, DropConnect, and Standout, a classic mechanism known for its adaptive dropout capabilities. Furthermore, our approach improves the robustness and generalization of neural network training without increasing computational complexity. The complete implementation of our methodology is publicly accessible for research and replication purposes at https://github.com/ericabd888/Adjusting-the-drop-probability-in-DropConnect-based-on-the-magnitude-of-the-gradient/.


Terrain Classification Enhanced with Uncertainty for Space Exploration Robots from Proprioceptive Data

Álvarez, Mariela De Lucas, Guo, Jichen, Domínguez, Raul, Valdenegro-Toro, Matias

arXiv.org Artificial Intelligence

Terrain Classification is an essential task in space exploration, where unpredictable environments are difficult to observe using only exteroceptive sensors such as vision. Implementing Neural Network classifiers can have high performance but can be deemed untrustworthy as they lack transparency, which makes them unreliable for taking high-stakes decisions during mission planning. We address this by proposing Neural Networks with Uncertainty Quantification in Terrain Classification. We enable our Neural Networks with Monte Carlo Dropout, DropConnect, and Flipout in time series-capable architectures using only proprioceptive data as input. We use Bayesian Optimization with Hyperband for efficient hyperparameter optimization to find optimal models for trustworthy terrain classification.


A Cryogenic Memristive Neural Decoder for Fault-tolerant Quantum Error Correction

Marcotte, Frédéric, Mouny, Pierre-Antoine, Yon, Victor, Dagnew, Gebremedhin A., Kulchytskyy, Bohdan, Rochette, Sophie, Beilliard, Yann, Drouin, Dominique, Ronagh, Pooya

arXiv.org Artificial Intelligence

Neural decoders for quantum error correction (QEC) rely on neural networks to classify syndromes extracted from error correction codes and find appropriate recovery operators to protect logical information against errors. Despite the good performance of neural decoders, important practical requirements remain to be achieved, such as minimizing the decoding time to meet typical rates of syndrome generation in repeated error correction schemes, and ensuring the scalability of the decoding approach as the code distance increases. Designing a dedicated integrated circuit to perform the decoding task in co-integration with a quantum processor appears necessary to reach these decoding time and scalability requirements, as routing signals in and out of a cryogenic environment to be processed externally leads to unnecessary delays and an eventual wiring bottleneck. In this work, we report the design and performance analysis of a neural decoder inference accelerator based on an in-memory computing (IMC) architecture, where crossbar arrays of resistive memory devices are employed to both store the synaptic weights of the decoder neural network and perform analog matrix-vector multiplications during inference. In proof-of-concept numerical experiments supported by experimental measurements, we investigate the impact of TiO$_\textrm{x}$-based memristive devices' non-idealities on decoding accuracy. Hardware-aware training methods are developed to mitigate the loss in accuracy, allowing the memristive neural decoders to achieve a pseudo-threshold of $9.23\times 10^{-4}$ for the distance-three surface code, whereas the equivalent digital neural decoder achieves a pseudo-threshold of $1.01\times 10^{-3}$. This work provides a pathway to scalable, fast, and low-power cryogenic IMC hardware for integrated QEC.


Almost Sure Convergence of Dropout Algorithms for Neural Networks

Senen-Cerda, Albert, Sanders, Jaron

arXiv.org Artificial Intelligence

We investigate the convergence and convergence rate of stochastic training algorithms for Neural Networks (NNs) that have been inspired by Dropout (Hinton et al., 2012). With the goal of avoiding overfitting during training of NNs, dropout algorithms consist in practice of multiplying the weight matrices of a NN componentwise by independently drawn random matrices with $\{0, 1 \}$-valued entries during each iteration of Stochastic Gradient Descent (SGD). This paper presents a probability theoretical proof that for fully-connected NNs with differentiable, polynomially bounded activation functions, if we project the weights onto a compact set when using a dropout algorithm, then the weights of the NN converge to a unique stationary point of a projected system of Ordinary Differential Equations (ODEs). After this general convergence guarantee, we go on to investigate the convergence rate of dropout. Firstly, we obtain generic sample complexity bounds for finding $\epsilon$-stationary points of smooth nonconvex functions using SGD with dropout that explicitly depend on the dropout probability. Secondly, we obtain an upper bound on the rate of convergence of Gradient Descent (GD) on the limiting ODEs of dropout algorithms for NNs with the shape of arborescences of arbitrary depth and with linear activation functions. The latter bound shows that for an algorithm such as Dropout or Dropconnect (Wan et al., 2013), the convergence rate can be impaired exponentially by the depth of the arborescence. In contrast, we experimentally observe no such dependence for wide NNs with just a few dropout layers. We also provide a heuristic argument for this observation. Our results suggest that there is a change of scale of the effect of the dropout probability in the convergence rate that depends on the relative size of the width of the NN compared to its depth.


How to Enable Uncertainty Estimation in Proximal Policy Optimization

Bykovets, Eugene, Metz, Yannick, El-Assady, Mennatallah, Keim, Daniel A., Buhmann, Joachim M.

arXiv.org Artificial Intelligence

While deep reinforcement learning (RL) agents have showcased strong results across many domains, a major concern is their inherent opaqueness and the safety of such systems in real-world use cases. To overcome these issues, we need agents that can quantify their uncertainty and detect out-of-distribution (OOD) states. Existing uncertainty estimation techniques, like Monte-Carlo Dropout or Deep Ensembles, have not seen widespread adoption in on-policy deep RL. We posit that this is due to two reasons: concepts like uncertainty and OOD states are not well defined compared to supervised learning, especially for on-policy RL methods. Secondly, available implementations and comparative studies for uncertainty estimation methods in RL have been limited. To overcome the first gap, we propose definitions of uncertainty and OOD for Actor-Critic RL algorithms, namely, proximal policy optimization (PPO), and present possible applicable measures. In particular, we discuss the concepts of value and policy uncertainty. The second point is addressed by implementing different uncertainty estimation methods and comparing them across a number of environments. The OOD detection performance is evaluated via a custom evaluation benchmark of in-distribution (ID) and OOD states for various RL environments. We identify a trade-off between reward and OOD detection performance. To overcome this, we formulate a Pareto optimization problem in which we simultaneously optimize for reward and OOD detection performance. We show experimentally that the recently proposed method of Masksembles strikes a favourable balance among the survey methods, enabling high-quality uncertainty estimation and OOD detection while matching the performance of original RL agents.


SoftDropConnect (SDC) -- Effective and Efficient Quantification of the Network Uncertainty in Deep MR Image Analysis

Lyu, Qing, Whitlow, Christopher T., Wang, Ge

arXiv.org Artificial Intelligence

Recently, deep learning has achieved remarkable successes in medical image analysis. Although deep neural networks generate clinically important predictions, they have inherent uncertainty. Such uncertainty is a major barrier to report these predictions with confidence. In this paper, we propose a novel yet simple Bayesian inference approach called SoftDropConnect (SDC) to quantify the network uncertainty in medical imaging tasks with gliomas segmentation and metastases classification as initial examples. Our key idea is that during training and testing SDC modulates network parameters continuously so as to allow affected information processing channels still in operation, instead of disabling them as Dropout or DropConnet does. When compared with three popular Bayesian inference methods including Bayes By Backprop, Dropout, and DropConnect, our SDC method (SDC-W after optimization) outperforms the three competing methods with a substantial margin. Quantitatively, our proposed method generates results withsubstantially improved prediction accuracy (by 10.0%, 5.4% and 3.7% respectively for segmentation in terms of dice score; by 11.7%, 3.9%, 8.7% on classification in terms of test accuracy) and greatly reduced uncertainty in terms of mutual information (by 64%, 33% and 70% on segmentation; 98%, 88%, and 88% on classification). Our approach promises to deliver better diagnostic performance and make medical AI imaging more explainable and trustworthy.


Structured DropConnect for Uncertainty Inference in Image Classification

Zheng, Wenqing, Xie, Jiyang, Liu, Weidong, Ma, Zhanyu

arXiv.org Artificial Intelligence

With the complexity of the network structure, uncertainty inference has become an important task to improve classification accuracy for artificial intelligence systems. For image classification tasks, we propose a structured DropConnect (SDC) framework to model the output of a deep neural network by a Dirichlet distribution. We introduce a DropConnect strategy on weights in the fully connected layers during training. In test, we split the network into several sub-networks, and then model the Dirichlet distribution by match its moments with the mean and variance of the outputs of these sub-networks. The entropy of the estimated Dirichlet distribution is finally utilized for uncertainty inference. In this paper, this framework is implemented on LeNet5 and VGG16 models for misclassification detection and out-of-distribution detection on MNIST and CIFAR-10 datasets. Experimental results show that the performance of the proposed SDC can be comparable to other uncertainty inference methods. Furthermore, the SDC is adapted well to different Figure 1: Illustration of the proposed structured DropConnect network structures with certain generalization capabilities and (SDC). In train phase, DropConnect is used on the research prospects.


Asymptotic convergence rate of Dropout on shallow linear neural networks

Senen-Cerda, Albert, Sanders, Jaron

arXiv.org Machine Learning

We analyze the convergence rate of gradient flows on objective functions induced by Dropout and Dropconnect, when applying them to shallow linear Neural Networks (NNs) - which can also be viewed as doing matrix factorization using a particular regularizer. Dropout algorithms such as these are thus regularization techniques that use 0,1-valued random variables to filter weights during training in order to avoid coadaptation of features. By leveraging a recent result on nonconvex optimization and conducting a careful analysis of the set of minimizers as well as the Hessian of the loss function, we are able to obtain (i) a local convergence proof of the gradient flow and (ii) a bound on the convergence rate that depends on the data, the dropout probability, and the width of the NN. Finally, we compare this theoretical bound to numerical simulations, which are in qualitative agreement with the convergence bound and match it when starting sufficiently close to a minimizer.


Know Where To Drop Your Weights: Towards Faster Uncertainty Estimation

Kamath, Akshatha, Gnaneshwar, Dwaraknath, Valdenegro-Toro, Matias

arXiv.org Machine Learning

Estimating epistemic uncertainty of models used in low-latency applications and Out-Of-Distribution samples detection is a challenge due to the computationally demanding nature of uncertainty estimation techniques. Estimating model uncertainty using approximation techniques like Monte Carlo Dropout (MCD), DropConnect (MCDC) requires a large number of forward passes through the network, rendering them inapt for low-latency applications. We propose Select-DC which uses a subset of layers in a neural network to model epistemic uncertainty with MCDC. Through our experiments, we show a significant reduction in the GFLOPS required to model uncertainty, compared to Monte Carlo DropConnect, with marginal trade-off in performance. We perform a suite of experiments on CIFAR 10, CIFAR 100, and SVHN datasets with ResNet and VGG models. We further show how applying DropConnect to various layers in the network with different drop probabilities affects the networks performance and the entropy of the predictive distribution.


S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima

Sung, Wonyong, Choi, Iksoo, Park, Jinhwan, Choi, Seokhyun, Shin, Sungho

arXiv.org Machine Learning

The stochastic gradient descent (SGD) method is most widely used for deep neural network (DNN) training. However, the method does not always converge to a flat minimum of the loss surface that can demonstrate high generalization capability. Weight noise injection has been extensively studied for finding flat minima using the SGD method. We devise a new weight-noise injection-based SGD method that adds symmetrical noises to the DNN weights. The training with symmetrical noise evaluates the loss surface at two adjacent points, by which convergence to sharp minima can be avoided. Fixed-magnitude symmetric noises are added to minimize training instability. The proposed method is compared with the conventional SGD method and previous weight-noise injection algorithms using convolutional neural networks for image classification. Particularly, performance improvements in large batch training are demonstrated. This method shows superior performance compared with conventional SGD and weight-noise injection methods regardless of the batch-size and learning rate scheduling algorithms.