Incorporating Visual Cortical Lateral Connection Properties into CNN: Recurrent Activation and Excitatory-Inhibitory Separation

Park, Jin Hyun, Zhang, Cheng, Choe, Yoonsuck

arXiv.org Artificial Intelligence 

Biologically motivated neural networks for visual processing such as Neocognitron [Fukushima, 1980], Convolutional Neural Networks (CNNs) [LeCun et al., 1989], and HMAX [Riesenhuber and Poggio, 1999], drew inspiration from Hubel and Wiesel's works on the primary visual cortical neurons [Hubel and Wiesel, 1959] and subsequent developments in the field. A common feature in these models is that the alternating layers of simple cells and complex cells form a feed-forward hierarchy, starting with the afferent connections from the input (for CNN, the convolutional layers and pooling layers may serve the same purpose [Lindsay, 2021]). The hierarchy in these models loosely mimic the projections among different cortical areas in the visual pathway [Felleman and V an Essen, 1991], where each convolutional layer correspond to a distinct visual cortical area, and the connections serving as the long-range projections. Functionally, feature representations in CNN also seem to show close similarity to those in the ventral visual pathway [Zeiler, 2014]. There is a major shortcoming in this, since the various visual cortical areas do not form a strict hierarchy, as there are feedback connections between the visual areas forming a recurrent loop [Briggs, 2020]. Some architectural features in modern CNN variants may serve this purpose. For instance, Liao and Poggio [Liao and Poggio, 2016] proposed that skipped connections in the ResNet [He et al., 2016] can be seen as implementing such recurrent projections (also see Recurrent CNN: [Liang and Hu, 2015]).