Goto

Collaborating Authors

 cifar-100 0


Appendix: Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM) AAnalysis of the DAMGate Function Dynamics During Training

Neural Information Processing Systems

In this section, we theoretically analyze the dynamics of the DAM mask gi at the i-th layer as the training process unfolds. The loss function for training the neural network for the target task can then be denoted as L= L(f(x,Θ,βi)) (e.g., cross-entropy loss for supervised structured pruning problems and reconstruction error for representation learning problems), where xdenotes the input features to the neural network. Using gradient descent methods with a learning rate of η, the expected update formula of βi in DAM is given by: βi = ηEx Dtr [ βiL(f(x,Θ,βi)) + λ βiβi/(l 1)] (2) = ηEx Dtr [ βiL(f(x,Θ,βi))] ηλ/(l 1) (3) Let hi be the layer output before applying the DAM mask, and the masked output be represented as oi = hi gi after applying the gate. For the j-th neuron, gij/ βi = 0 if and only if ξj(βi)/ βi = 0. Since tanh(z) has non-zero gradients for z >0, the gradient of ξj(βi) is 0 only when kj/ni + βi 0, i.e., the mask value of the neuron is 0 (or in other words, it is deactivated or dead). Let us denote the set of all neuron indices with non-zero mask values (also referred to as active neurons) as J. Equation 4 can then be simplified as: βiL(f(x,Θ,βi)) = αi X We can make the following two observations: (i) only those neurons that are active (i.e., have non-zero mask values) have a contribution towards updating βi and moving the gate function. We name these neurons as support neurons and their position in the ordering of neurons as the transitioning zone of the gate function.



Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM)

arXiv.org Artificial Intelligence

A central goal in deep learning is to learn compact representations of features at every layer of a neural network, which is useful for both unsupervised representation learning and structured network pruning. While there is a growing body of work in structured pruning, current state-of-the-art methods suffer from two key limitations: (i) instability during training, and (ii) need for an additional step of fine-tuning, which is resource-intensive. At the core of these limitations is the lack of a systematic approach that jointly prunes and refines weights during training in a single stage, and does not require any fine-tuning upon convergence to achieve state-of-the-art performance. We present a novel single-stage structured pruning method termed DiscriminAtive Masking (DAM). The key intuition behind DAM is to discriminatively prefer some of the neurons to be refined during the training process, while gradually masking out other neurons. We show that our proposed DAM approach has remarkably good performance over various applications, including dimensionality reduction, recommendation system, graph representation learning, and structured pruning for image classification. We also theoretically show that the learning objective of DAM is directly related to minimizing the L0 norm of the masking layer.


Complex-Valued Neural Networks for Privacy Protection

arXiv.org Machine Learning

This paper proposes a generic method to revise traditional neural networks for privacy protection. Our method is designed to prevent inversion attacks, i.e., avoiding recovering private information from intermediate-layer features of a neural network. Our method transforms real-valued features of an intermediate layer into complex-valued features, in which private information is hidden in a random phase of the transformed features. To prevent the adversary from recovering the phase, we adopt an adversarial-learning algorithm to generate the complex-valued feature. More crucially, the transformed feature can be directly processed by the deep neural network, but without knowing the true phase, people cannot recover either the input information or the prediction result. Preliminary experiments with various neural networks (including the LeNet, the VGG, and residual networks) on different datasets have shown that our method can successfully defend feature inversion attacks while preserving learning accuracy.