CNN {2}: Viewpoint Generalization via a Binocular Vision
–Neural Information Processing Systems
The Convolutional Neural Networks (CNNs) have laid the foundation for many techniques in various applications. Despite achieving remarkable performance in some tasks, the 3D viewpoint generalizability of CNNs is still far behind humans visual capabilities. Although recent efforts, such as the Capsule Networks, have been made to address this issue, these new models are either hard to train and/or incompatible with existing CNN-based techniques specialized for different applications. Observing that humans use binocular vision to understand the world, we study in this paper whether the 3D viewpoint generalizability of CNNs can be achieved via a binocular vision. We propose CNN {2}, a CNN that takes two images as input, which resembles the process of an object being viewed from the left eye and the right eye.
Neural Information Processing Systems
Oct-11-2024, 01:44:59 GMT