Review for NeurIPS paper: Train-by-Reconnect: Decoupling Locations of Weights from Their Values

Feb-8-2025, 00:55:45 GMT–Neural Information Processing Systems

Weaknesses: * I am not sure how novel or meaningful the analysis of "weight profiles" is in Section 2. Checking the provided code, the weight profiles in Figures 1 and 2 are plotted for the weights in an ImageNet-pretrained model as: vgg16 tf.keras.applications.vgg16.VGG16(include_top True, weights "imagenet") It would be important to know what hyperparameters were used in the training script for the pre-trained models. It is likely that the weight initialization was Gaussian, and that weight decay was used for regularization. Then the distribution of weights in the trained model may not differ too greatly from the initial distribution (e.g., still roughly Gaussian). One can obtain similar plots to Figure 2 by sorting random Gaussian samples: samples np.random.normal(size Alternatively, there are many distributions other than Gaussians that could potentially yield similar heavy-tailed plots as Figures 1 and 2. A relevant paper looking at the distributions of trained network weights is [1].

batchnorm, decoupling location, train-by-reconnect, (9 more...)

Neural Information Processing Systems

Feb-8-2025, 00:55:45 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)