Reviews: Learning Spherical Convolution for Fast Features from 360 Imagery

Neural Information Processing Systems 

This paper describes a method to transform networks learned on perspective images to take spherical images as input. This is an important problem as fisheye and 360-degree sensors become more and more ubiquitous but training data is relatively scarce. The method first transforms the network architecture to adapt the filter sizes and pooling operations to convolutions on a equirectangular representation/projection. Next the filters are learned to match the feature responses of the original network when considering the projections to the tangent plane of the respective feature response. The filters are pre-learned layer-by-layer and fine-tuned to output features as similar as possible to the original network projected to the tangent planes. Detection experiments on Pano2Vid and PASCAL demonstrate that the technique performs slightly below the optimal performance using per-pixel tangent projections (however significantly faster) while outperforming several baselines, including cube map projections.