Country
Magnitude and Uncertainty Pruning Criterion for Neural Networks
Ko, Vinnie, Oehmcke, Stefan, Gieseke, Fabian
Neural networks have achieved dramatic improvements in recent years and depict the state-of-the-art methods for many real-world tasks nowadays. One drawback is, however, that many of these models are overparameterized, which makes them both computationally and memory intensive. Furthermore, overparameterization can also lead to undesired overfitting side-effects. Inspired by recently proposed magnitude-based pruning schemes and the Wald test from the field of statistics, we introduce a novel magnitude and uncertainty (M&U) pruning criterion that helps to lessen such shortcomings. One important advantage of our M&U pruning criterion is that it is scale-invariant, a phenomenon that the magnitude-based pruning criterion suffers from. In addition, we present a ``pseudo bootstrap'' scheme, which can efficiently estimate the uncertainty of the weights by using their update information during training. Our experimental evaluation, which is based on various neural network architectures and datasets, shows that our new criterion leads to more compressed models compared to models that are solely based on magnitude-based pruning criteria, with, at the same time, less loss in predictive power.
Feature Relevance Determination for Ordinal Regression in the Context of Feature Redundancies and Privileged Information
Pfannschmidt, Lukas, Jakob, Jonathan, Hinder, Fabian, Biehl, Michael, Tino, Peter, Hammer, Barbara
Advances in machine learning technologies have led to increasingly powerful models in particular in the context of big data. Yet, many application scenarios demand for robustly interpretable models rather than optimum model accuracy; as an example, this is the case if potential biomarkers or causal factors should be discovered based on a set of given measurements. In this contribution, we focus on feature selection paradigms, which enable us to uncover relevant factors of a given regularity based on a sparse model. We focus on the important specific setting of linear ordinal regression, i.e.\ data have to be ranked into one of a finite number of ordered categories by a linear projection. Unlike previous work, we consider the case that features are potentially redundant, such that no unique minimum set of relevant features exists. We aim for an identification of all strongly and all weakly relevant features as well as their type of relevance (strong or weak); we achieve this goal by determining feature relevance bounds, which correspond to the minimum and maximum feature relevance, respectively, if searched over all equivalent models. In addition, we discuss how this setting enables us to substitute some of the features, e.g.\ due to their semantics, and how to extend the framework of feature relevance intervals to the setting of privileged information, i.e.\ potentially relevant information is available for training purposes only, but cannot be used for the prediction itself.
Integration of Neural Network-Based Symbolic Regression in Deep Learning for Scientific Discovery
Kim, Samuel, Lu, Peter, Mukherjee, Srijon, Gilbert, Michael, Jing, Li, Ceperic, Vladimir, Soljacic, Marin
--Symbolic regression is a powerful technique that can discover analytical equations that describe data, which can lead to explainable models and generalizability outside of the training data set. In contrast, neural networks have achieved amazing levels of accuracy on image recognition and natural language processing tasks, but are often seen as black-box models that are difficult to interpret and typically extrapolate poorly. Here we use a neural network-based architecture for symbolic regression that we call the Sequential Equation Learner (SEQL) network and integrate it with other deep learning architectures such that the whole system can be trained end-to-end through backpropagation. T o demonstrate the power of such systems, we study their performance on several substantially different tasks. First, we show that the neural network can perform symbolic regression and learn the form of several functions. Next, we present an MNIST arithmetic task where a separate part of the neural network extracts the digits. Finally, we demonstrate prediction of dynamical systems where an unknown parameter is extracted through an encoder . We find that the EQL-based architecture can extrapolate quite well outside of the training data set compared to a standard neural network-based architecture, paving the way for deep learning to be applied in scientific exploration and discovery. Many complex phenomena in science and engineering can be reduced to general models that can be described in terms of relatively simple mathematical equations. For example, classical electrodynamics can be described by Maxwell's equations and non-relativistic quantum mechanics can be described by the Schr odinger equation. These models elucidate the underlying dynamics of a particular system and can provide general predictions over a very wide range of conditions. On the other hand, modern machine learning techniques have become increasingly powerful for many tasks including image recognition and natural language processing, but the neural network-based architectures in these state-of-the-art techniques are black-box models that often make them difficult for use in scientific exploration.
Transparent Classification with Multilayer Logical Perceptrons and Random Binarization
Wang, Zhuo, Zhang, Wei, Liu, Ning, Wang, Jianyong
Models with transparent inner structure and high classification performance are required to reduce potential risk and provide trust for users in domains like health care, finance, security, etc. However, existing models are hard to simultaneously satisfy the above two properties. In this paper, we propose a new hierarchical rule-based model for classification tasks, named Concept Rule Sets (CRS), which has both a strong expressive ability and a transparent inner structure. To address the challenge of efficiently learning the non-differentiable CRS model, we propose a novel neural network architecture, Multilayer Logical Perceptron (MLLP), which is a continuous version of CRS. Using MLLP and the Random Binarization (RB) method we proposed, we can search the discrete solution of CRS in continuous space using gradient descent and ensure the discrete CRS acts almost the same as the corresponding continuous MLLP . Experiments on 12 public data sets show that CRS outperforms the state-of-the-art approaches and the complexity of the learned CRS is close to the simple decision tree. Introduction Relying on strong ability of data modeling, machine learning, especially deep learning, becomes the main paradigm for decision-making systems (Goodfellow et al. 2016; Doshi-V elez and Kim 2017). The decision-making systems have widespread usage in important areas such as medicine, finance, politics, as well as law, where people need the explanations why decisions are made to ensure their safety and protect their rights (Goodman and Flaxman 2016; Lipton 2016). As a result, the demand for the transparency of machine learning methods is increasing, which is crucial for earning the trust of users (Doshi-V elez and Kim 2017) and reducing potential risks and bugs (Chu et al. 2018). However, most of the machine learning models can hardly ensure good predictive ability and transparency at the same time, and sacrificing transparency for good performance could result in serious consequences.
Reconstructing Multi-echo Magnetic Resonance Images via Structured Deep Dictionary Learning
Singhal, Vanika, Majumdar, Angshul
Multi-echo magnetic resonance (MR) images are acquired by changing the echo times (for T2 weighted) or relaxation times (for T1 weighted) of scans. The resulting (multi-echo) images are usually used for quantitative MR imaging. Acquiring MR images is a slow process and acquiring multi scans of the same cross section for multi-echo imaging is even slower. In order to accelerate the scan, compressed sensing (CS) based techniques have been advocating partial K-space (Fourier domain) scans; the resulting images are reconstructed via structured CS algorithms. In recent times, it has been shown that instead of using off-the-shelf CS, better results can be obtained by adaptive reconstruction algorithms based on structured dictionary learning. In this work, we show that the reconstruction results can be further improved by using structured deep dictionaries. Experimental results on real datasets show that by using our proposed technique the scan-time can be cut by half compared to the state-of-the-art.
Backprop Diffusion is Biologically Plausible
Betti, Alessandro, Gori, Marco
The Backpropagation algorithm relies on the abstraction of using a neural model that gets rid of the notion of time, since the input is mapped instantaneously to the output. In this paper, we cl aim that this abstraction of ignoring time, along with the abrupt inp ut changes that occur when feeding the training set, are in fact the reas ons why, in some papers, Backprop biological plausibility is regarded as an arguable issue. We show that as soon as a deep feedforward network oper ates with neurons with time-delayed response, the backprop weig ht update turns out to be the basic equation of a biologically plausibl e diffusion process based on forward-backward waves. We also show that s uch a process very well approximates the gradient for inputs that are not too fast with respect to the depth of the network. These remarks s omewhat disclose the diffusion process behind the backprop equation and leads us to interpret the corresponding algorithm as a degenerati on of a more general diffusion process that takes place also in neural net works with cyclic connections.
Classification under local differential privacy
Berrett, Thomas, Butucea, Cristina
Despite the long history of this problem there are still many open pro blems and it remains an active topic of research. Recent work has focused on weake ning commonly-made assumptions [ 4 ], studying situations in which the training data comes froma different distribution to the test data [ 3, 5 ], and making predictions under constraints on allowable classifiers [ 16 ]. In recent years, it has become clear that in certain studies t here is a need to preserve the privacy of the individuals whose data is collected . As a way of formalising the problem, the framework of differential privacy, see [ 9 ] and [ 10 ], has prevailed as a natural solution.
Accurate Entrance Position Detection Based on Wi-Fi and GPS Signals Using Machine Learning
ABSTRACT: T his paper aims at detecting an accurate position of the main entrance of the buildings. The proposed approach relies on the fact that the GPS signals drop significantly when the user enters a building. Moreover, as most of the public buildings provide Wi - Fi services, the W i - Fi received signal strength (RSS) can be utilized in order to detect the entrance of the buildings. The rationale behind this paper is that the GPS signals decrease as the user gets close to the main entrance and the Wi - Fi signal increases as the user ap proaches the main entrance. Several real experiments have been conducted in order to guarantee the feasibility of the proposed approach.
Expansion of Cyber Attack Data From Unbalanced Datasets Using Generative Techniques
Machine learning techniques help to understand patterns of a dataset to create a defense mechanism against cyber attacks. However, it is difficult to construct a theoretical model due to the imbalances in the dataset for discriminating attacks from the overall dataset. Multilayer Perceptron (MLP) technique will provide improvement in accuracy and increase the performance of detecting the attack and benign data from a balanced dataset. We have worked on the UGR'16 dataset publicly available for this work. Data wrangling has been done due to prepare test set from in the original set. We fed the neural network classifier larger input to the neural network in an increasing manner (i.e. 10000, 50000, 1 million) to see the distribution of features over the accuracy. We have implemented a GAN model that can produce samples of different attack labels (e.g. blacklist, anomaly spam, ssh scan). We have been able to generate as many samples as necessary based on the data sample we have taken from the UGR'16. We have tested the accuracy of our model with the imbalance dataset initially and then with the increasing the attack samples and found improvement of classification performance for the latter.
Exact expressions for double descent and implicit regularization via surrogate random design
Dereziński, Michał, Liang, Feynman, Mahoney, Michael W.
Double descent refers to the phase transition that is exhibited by the generalization error of unregularized learning models when varying the ratio between the number of parameters and the number of training samples. The recent success of highly over-parameterized machine learning models such as deep neural networks has motivated a theoretical analysis of the double descent phenomenon in classical models such as linear regression which can also generalize well in the over-parameterized regime. We build on recent advances in Randomized Numerical Linear Algebra (RandNLA) to provide the first exact non-asymptotic expressions for double descent of the minimum norm linear estimator. Our approach involves constructing what we call a surrogate random design to replace the standard i.i.d. design of the training sample. This surrogate design admits exact expressions for the mean squared error of the estimator while preserving the key properties of the standard design. We also establish an exact implicit regularization result for over-parameterized training samples. In particular, we show that, for the surrogate design, the implicit bias of the unregularized minimum norm estimator precisely corresponds to solving a ridge-regularized least squares problem on the population distribution.