Wider Networks Learn Better Features

Sep-25-2019–arXiv.org Machine Learning

While the process of feature learning by deep neural networks is still poorly understood, numerous techniques have been developed for visualizing these features in trained networks in an attempt to better understand the learning process [10, 29, 26, 21]. This is usually achieved by optimizing over the input space in order to maximize the activation of a single neuron or filter, with appropriate regularization. These techniques are commonly applied to pre-trained state-of-the-art models, yet they can also be used to explore the effects of various architectural choices on the learned representations. In this work, we specifically investigate the effect of the number of neurons on the quality of the learned representations. It has been shown that increasing the width of a neural network will generally not affect performance on a given task or even improve it [23, 13], and it is natural to study the effect of this modification on the learned representations. We use the recently developed activation atlases technique [3] in order to visualize the inputs that activate an entire hidden state, as opposed to a single neuron. In the case of a wide network, we find that the resulting visualizations differ dramatically. Our main findings are: - As we increase the number of neurons in a network, the features that an individual neuron responds to appear less like natural images. However, there exist directions in the hidden state space that are activated by natural-looking images.

deep learning, neural network, wide network, (15 more...)

arXiv.org Machine Learning

Sep-25-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States > New York > New York County > New York City (0.14)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found