Reviews: Improved Expressivity Through Dendritic Neural Networks

Neural Information Processing Systems 

This paper presents D-Nets, an architecture loosely inspired by the dendrites of biological neurons. In a D-Net, each neuron receives input from the previous layer as the maxpool of linear combinations of disjoint random subsets of that layer's outputs. The authors show that this approach outperforms self-normalizing neural networks and other advanced approaches on the UCI collection of datasets (as well as outperforming simple non-convolutional approaches to MNIST and CIFAR). They provide an intuition that greater fan-in to non-linearities leads to a greater number of linear regions and thus, perhaps, greater expressibility. I am still quite surprised that such a simple method performs so well, but the experimental setup seems sound. For example, how does the optimal number of branches grow with the size of the layer?