Goto

Collaborating Authors

 resnet34


cd5404354496e39d37b7947d8a0d7b72-Supplemental-Conference.pdf

Neural Information Processing Systems

A.1 Additional Experiments on CIFAR102 We expanded our experiments on the CIFAR10 dataset by utilizing weights pretrained for 1003 iterations with a batch size of 128 per iteration. The CIFAR10 dataset consists of 50,000 training4 images and 10,000 testing images, divided into 10 different classes. The results of these experiments5 are summarized in Table 1.6 We observed performance improvement relative to baseline. However, compared to other modes of7 pretraining for CIFAR10, certain PaI generators exhibited higher-than-expected standard deviation and8 lower average performance, indicating some instability in generating sparse structures. Specifically,9 we observed this trend with GraSP in ResNet18 and SNIP in ResNet34.10


Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

Neural Information Processing Systems

Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalanceis based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of different layers in trained models. We demonstrate the efficacy of using HT-SR-motivated metrics to guide the scheduling and balancing of temperature across all network layers during model training, resulting in improved performance during testing.


CLDA: Contrastive Learning for Semi-Supervised Domain Adaptation (Supplementary Material)

Neural Information Processing Systems

The supplementary material consists of the following. Additional Results of the DomainNet dataset for 5 and 10-shot settings with Resnet34 as backbone network are shown in Table 1. Results are reported in Tables 2 and 3 Discussion on Limitations and Societal Impacts. The architecture of the network is similar to [2]. All other hyperparameters used in our framework are described in the main paper.






Appendix for " Residual Alignment: Uncovering the Mechanisms of Residual Networks " Anonymous Author(s) Affiliation Address email

Neural Information Processing Systems

We start by providing motivation for the unconstrained Jacobians problem introduced in the main text. We will continue our proof using contradiction. Figure 1: Fully-connected ResNet34 (Type 1 model) trained on MNIST.Figure 2: Fully-connected ResNet34 (Type 1 model) trained on FashionMNIST. Figure 10: Fully-connected ResNet34 (Type 1 model) trained on MNIST. Figure 24: Fully-connected ResNet34 (Type 1 model) trained on MNIST.