An intriguing failing of convolutional neural networks and the CoordConv solution

Liu, Rosanne, Lehman, Joel, Molino, Piero, Such, Felipe Petroski, Frank, Eric, Sergeev, Alex, Yosinski, Jason

Neural Information Processing Systems 

Few ideas have enjoyed as large an impact on deep learning as convolution. For any problem involving pixels or spatial representations, common intuition holds that convolutional neural networks may be appropriate. In this paper we show a striking counterexample to this intuition via the seemingly trivial coordinate transform problem, which simply requires learning a mapping between coordinates in (x,y) Cartesian space and coordinates in one-hot pixel space. Although convolutional networks would seem appropriate for this task, we show that they fail spectacularly. We demonstrate and carefully analyze the failure first on a toy problem, at which point a simple fix becomes obvious.