### Stochastic Channel-Based Federated Learning for Medical Data Privacy Preserving

Artificial neural network has achieved unprecedented success in the medical domain. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns and people want to take control over their sensitive information during both training and using processes. To address this problem, we propose a privacy-preserving method for the distributed system, Stochastic Channel-Based Federated Learning (SCBF), which enables the participants to train a high-performance model cooperatively without sharing their inputs. Specifically, we design, implement and evaluate a channel-based update algorithm for the central server in a distributed system, which selects the channels with regard to the most active features in a training loop and uploads them as learned information from local datasets. A pruning process is applied to the algorithm based on the validation set, which serves as a model accelerator. In the experiment, our model presents better performances and higher saturating speed than the Federated Averaging method which reveals all the parameters of local models to the server when updating. We also demonstrate that the saturating rate of performance could be promoted by introducing a pruning process. And further improvement could be achieved by tuning the pruning rate. Our experiment shows that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUCROC performance and a reduction of 0.0068 in AUCPR.

### The Search for Sparse, Robust Neural Networks

Recent work on deep neural network pruning has shown there exist sparse subnetworks that achieve equal or improved accuracy, training time, and loss using fewer network parameters when compared to their dense counterparts. Orthogonal to pruning literature, deep neural networks are known to be susceptible to adversarial examples, which may pose risks in security- or safety-critical applications. Intuition suggests that there is an inherent trade-off between sparsity and robustness such that these characteristics could not co-exist. We perform an extensive empirical evaluation and analysis testing the Lottery Ticket Hypothesis with adversarial training and show this approach enables us to find sparse, robust neural networks. Code for reproducing experiments is available here: https://github.com/justincosentino/robust-sparse-networks.

### Pruned and Structurally Sparse Neural Networks

Advances in designing and training deep neural networks have led to the principle that the large and deeper a network is, the better it can perform. As a result, computational resources have become a key limiting factor in achieving better performance. One strategy to improve network capabilities while decreasing computation required is to replace dense fully-connected and convolutional layers with sparse layers. In this paper we experiment with training on sparse neural network topologies. First, we test pruning-based sparse topologies, which use a network topology obtained by initially training a dense network and then pruning low-weight connections. Second, we test RadiX-Nets, a class of sparse network structures with proven connectivity and sparsity properties. Results show that compared to dense topologies, sparse structures show promise in training potential but also can exhibit highly nonlinear convergence, which merits further study.

### Dynamic Network Surgery for Efficient DNNs

Deep learning has become a ubiquitous technology to improve machine intelligence. However, most of the existing deep models are structurally very complex, making them difficult to be deployed on the mobile platforms with limited computational power. In this paper, we propose a novel network compression method called dynamic network surgery, which can remarkably reduce the network complexity by making on-the-fly connection pruning. Unlike the previous methods which accomplish this task in a greedy way, we properly incorporate connection splicing into the whole process to avoid incorrect pruning and make it as a continual network maintenance. The effectiveness of our method is proved with experiments. Without any accuracy loss, our method can efficiently compress the number of parameters in LeNet-5 and AlexNet by a factor of $\bm{108}\times$ and $\bm{17.7}\times$ respectively, proving that it outperforms the recent pruning method by considerable margins. Code and some models are available at https://github.com/yiwenguo/Dynamic-Network-Surgery.

### On Iterative Neural Network Pruning, Reinitialization, and the Similarity of Masks

A BSTRACT We examine how recently documented, fundamental phenomena in deep learning models subject to pruning are affected by changes in the pruning procedure. Specifically, we analyze differences in the connectivity structure and learning dynamics of pruned models found through a set of common iterative pruning techniques, to address questions of uniqueness of trainable, high-sparsity sub-networks, and their dependence on the chosen pruning method. In convolutional layers, we document the emergence of structure induced by magnitude-based unstructured pruning in conjunction with weight rewinding that resembles the effects of structured pruning. We also show empirical evidence that weight stability can be automatically achieved through apposite pruning techniques. 1 I NTRODUCTION Deep neural architectures have seen a dramatic increase in size over the years (Amodei & Hernandez, 2018). While advantageous under this point of view, the proliferation of parameters in neural architectures may induce adverse consequences. The computational cost to train some state-of-the-art models has raised the barrier to entry for many researchers hoping to contribute. Because of limited memory, time, and compute, and to enable private, secure, on-device computation, methods for model compression have seen a rise in popularity. Among these are techniques for model pruning, quantization, and distillation. Pruning, in particular, has been seen as an overfitting avoidance method since the early decision tree literature (Breiman et al., 1984), with work such as that of Mingers (1989) comparing the effects of different tree pruning techniques.