Understanding Feature Transfer Through Representation Alignment

Imani, Ehsan, Hu, Wei, White, Martha

arXiv.org Artificial Intelligence 

Abstract--Training with the true labels of a dataset as opposed to randomized labels leads to faster optimization and better generalization. This difference is attributed to a notion of alignment between inputs and labels in natural datasets. We find that training neural networks with different architectures and optimizers on random or true labels enforces the same relationship between the hidden representations and the training labels, elucidating why neural network representations have been so successful for transfer. We first highlight why aligned features promote transfer and show in a classic synthetic transfer problem that alignment is the determining factor for positive and negative transfer to similar and dissimilar tasks. We then investigate a variety of neural network architectures and find that (a) alignment emerges across a variety of different architectures and optimizers, with more alignment arising from depth (b) alignment increases for layers closer to the output and (c) existing high-performance deep CNNs exhibit high levels of alignment. One direction has been to analyze what abstractions the of that network and finally train a subsequent model on network has learned, agnostic to exactly how it is represented. The Shwartz et al. [12] studied neural networks through the lens premise is that neural networks adapt their intermediate of information theory and found that, during training, the representations--hidden representations--to the source task network preserves the information necessary for predicting and, due to the commonalities between the two tasks, these the output while throwing away unnecessary information learned representations help training on the target task [1]. Using representational Availability of large datasets like ImageNet [2] and the News similarity matrices, [13] found that on synthetic datasets Dataset for Word2Vec [3] provides suitable source tasks that where task-relevance of features can be controlled, learned facilitate using neural networks for feature construction for hidden representations suppress task-irrelevant features and Computer Vision and Natural Language Processing (NLP) enhance task-relevant features.