Appearance invariance in convolutional networks with neighborhood similarity

Tasdizen, Tolga, Sajjadi, Mehdi, Javanmardi, Mehran, Ramesh, Nisha

arXiv.org Machine Learning 

The recent successes of deep learning are partially attributed to supervised training of networks with large numbers of parameters using large datasets. In computer vision, supervised training of convolutional networks with very large labeled datasets provide state-of-the-art solutions in many applications such as object recognition, image captioning and question answering. While it has been shown that convolutional networks have low generalization error, their generalization capability does not extend to samples which are not adequately represented by the training data. A potential source of mismatch between the training data distribution and new samples is appearance. To a human, the images shown in Figure 3 (top row) unambiguously represent the digits "4", "2" and "6" whereas a convolutional network trained on the original MNIST dataset has a low probability of producing the correct answer for the modified digit images. The reason that a human has an easy time at this task is not because he has previously been exposed to the particular representations of the digits shown in Figure 3, but because he is able to adapt to novel appearances of learned concepts. Invariances to a predetermined set of transformations such as translation, rotation, contrast and noise can be taught to the network via methods such as tangent prop [1] and data augmentation [2]; however, these methods can not adapt to new appearances such as those shown in Figure 3. Similarly, domain adaptation [3, 4] offers a solution only if a sufficient number of images in the target domain are available.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found