softmax cross-entropy
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > New Jersey (0.04)
- North America > Canada (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Michigan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > New Jersey (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > New York (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- (6 more...)
We thank all the reviewers (R1, R2, R3) for their feedback and suggestions
We thank all the reviewers ( R1, R2, R3) for their feedback and suggestions. However, this is not paired with an increase in classification accuracy. We will add the suggested experiment to Section 3.3. On CIFAR-100, we obtain an accuracy of 55.5 The proposed loss requires a similarity to only one class during training. 's suggestion, we have quantified the relation The results are shown in Table B. Our proposal We will add the discussions to Section 3.1.
f0bf4a2da952528910047c31b6c2e951-Paper.pdf
Previous work has proposed many new loss functions and regularizers that improve test accuracy on image classification tasks. However, it is not clear whether these loss functions learn better representations for downstream tasks. This paper studies how the choice of training objective affects the transferability of the hidden representations of convolutional neural networks trained on ImageNet. We show that many objectives lead to statistically significant improvements in ImageNet accuracy over vanilla softmax cross-entropy, but the resulting fixed feature extractors transfer substantially worse to downstream tasks, and the choice of loss has little effect when networks are fully fine-tuned on the new tasks. Using centered kernel alignment to measure similarity between hidden representations of networks, we find that differences among loss functions are apparent only in the last few layers of the network. We delve deeper into representations of the penultimate layer, finding that different objectives and hyperparameter combinations lead to dramatically different levels of class separation. Representations with higher class separation obtain higher accuracy on the original task, but their features are less useful for downstream tasks. Our results suggest there exists a trade-off between learning invariant features for the original task and features relevant for transfer tasks.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Michigan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)